Project

General

Profile

Actions

Bug #7735

closed

OOM in Rudder when there are too many repaired reports

Added by Nicolas CHARLES almost 9 years ago. Updated over 2 years ago.

Status:
Released
Priority:
1 (highest)
Category:
Web - Compliance & node report
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
Fix check:
Regression:

Description

If there are too many repaired reports in the database, the Rudder web interface requires a lot more memory, and can lead to OOM

There can be a lot of repaired reports (for instance if you use a lot of command_execution in ncf technique editor), and witha lot of nodes, it can quicly add up to 2 millions entries (runs every 5 minutes, 10 repairs per run, 300 nodes -> 2.5 millions repaired entries)

an output of the OOM is the following, but it may be really anything at all

ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
java.lang.OutOfMemoryError: Java heap space
Exception in thread "Connection reader for connection 1 to localhost:389" java.lang.OutOfMemoryError: Java heap space
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    at com.unboundid.asn1.ASN1StreamReader.read(ASN1StreamReader.java:978)
    at com.unboundid.asn1.ASN1StreamReader.readType(ASN1StreamReader.java:327)
    at com.unboundid.asn1.ASN1StreamReader.beginSequence(ASN1StreamReader.java:900)
    at com.unboundid.ldap.protocol.LDAPMessage.readLDAPResponseFrom(LDAPMessage.java:1146)
    at com.unboundid.ldap.sdk.LDAPConnectionReader.run(LDAPConnectionReader.java:257)
Exception in thread "pool-2-thread-5" java.lang.OutOfMemoryError: Java heap space
    at org.postgresql.jdbc2.TimestampUtils.loadCalendar(TimestampUtils.java:101)
    at org.postgresql.jdbc2.TimestampUtils.toTimestamp(TimestampUtils.java:333)
    at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getTimestamp(AbstractJdbc2ResultSet.java:540)
    at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getTimestamp(AbstractJdbc2ResultSet.java:2629)
    at com.normation.rudder.repository.jdbc.ReportsMapper$.mapRow(ReportsJdbcRepository.scala:448)
    at com.normation.rudder.repository.jdbc.ReportsMapper$.mapRow(ReportsJdbcRepository.scala:438)
    at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:92)
    at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:60)
    at org.springframework.jdbc.core.JdbcTemplate$1QueryStatementCallback.doInStatement(JdbcTemplate.java:446)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:396)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:456)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:464)
    at com.normation.rudder.repository.jdbc.ReportsJdbcRepository.getErrorReportsBeetween(ReportsJdbcRepository.scala:428)
    at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger$$anonfun$messageHandler$1.applyOrElse(AutomaticReportLogger.scala:130)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    at net.liftweb.actor.LiftActor$class.execTranslate(LiftActor.scala:440)
    at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.execTranslate(AutomaticReportLogger.scala:84)
    at net.liftweb.actor.SpecializedLiftActor$class.liftedTree2$1(LiftActor.scala:288)
    at net.liftweb.actor.SpecializedLiftActor$class.net$liftweb$actor$SpecializedLiftActor$$proc2(LiftActor.scala:287)
    at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply$mcV$sp(LiftActor.scala:210)
    at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply(LiftActor.scala:210)
    at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply(LiftActor.scala:210)
    at net.liftweb.actor.SpecializedLiftActor$class.around(LiftActor.scala:224)
    at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.around(AutomaticReportLogger.scala:84)
    at net.liftweb.actor.SpecializedLiftActor$class.net$liftweb$actor$SpecializedLiftActor$$processMailbox(LiftActor.scala:209)
    at net.liftweb.actor.SpecializedLiftActor$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiftActor.scala:173)
    at net.liftweb.actor.LAScheduler$$anonfun$9$$anon$2$$anon$3.run(LiftActor.scala:64)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Workarounds

A workaround may be to add more ram to the Rudder web app, as explained here: http://www.rudder-project.org/doc-3.2/_performance_tuning.html#_java_out_of_memory_error

It may not be sufficient in case where there is really a lot of event, so you may need a more radical workaround, as explain in comments, which is to delete the surnumerous events:

delete from ruddersysevents where eventtype = 'result_repaired' and executiontimestamp < now()-'1 hour'::interval ;

Of course, it's a rather irreversible workaround, so you may want to know what where the problems before:

select nodeid,directiveid,ruleid,component,keyvalue,msg
  from ruddersysevents
  where eventtype = 'result_repaired' and executiontimestamp < now()-'1 hour'::interval
  group by nodeid,directiveid,ruleid, component, keyvalue,msg
;

Subtasks 1 (0 open1 closed)

Bug #8195: Correct memory profile and efficiency of non-compliant-reports loggerReleasedVincent MEMBRÉ2016-04-19Actions

Related issues 1 (0 open1 closed)

Related to Rudder - User story #5617: Detecting and restarting Rudder on OOM (Out Of Memory Exception)ResolvedActions
Actions

Also available in: Atom PDF