Bug #7735
Updated by François ARMAND over 8 years ago
If there are too many repaired reports in the database, the Rudder web interface requires a lot more memory, and can lead to OOM
There can be a lot of repaired reports (for instance if you use a lot of command_execution in ncf technique editor), and witha lot of nodes, it can quicly add up to 2 millions entries (runs every 5 minutes, 10 repairs per run, 300 nodes -> 2.5 millions repaired entries)
an output of the OOM is the following, but it may be really anything at all
<pre>
ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
java.lang.OutOfMemoryError: Java heap space
Exception in thread "Connection reader for connection 1 to localhost:389" java.lang.OutOfMemoryError: Java heap space
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at com.unboundid.asn1.ASN1StreamReader.read(ASN1StreamReader.java:978)
at com.unboundid.asn1.ASN1StreamReader.readType(ASN1StreamReader.java:327)
at com.unboundid.asn1.ASN1StreamReader.beginSequence(ASN1StreamReader.java:900)
at com.unboundid.ldap.protocol.LDAPMessage.readLDAPResponseFrom(LDAPMessage.java:1146)
at com.unboundid.ldap.sdk.LDAPConnectionReader.run(LDAPConnectionReader.java:257)
Exception in thread "pool-2-thread-5" java.lang.OutOfMemoryError: Java heap space
at org.postgresql.jdbc2.TimestampUtils.loadCalendar(TimestampUtils.java:101)
at org.postgresql.jdbc2.TimestampUtils.toTimestamp(TimestampUtils.java:333)
at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getTimestamp(AbstractJdbc2ResultSet.java:540)
at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getTimestamp(AbstractJdbc2ResultSet.java:2629)
at com.normation.rudder.repository.jdbc.ReportsMapper$.mapRow(ReportsJdbcRepository.scala:448)
at com.normation.rudder.repository.jdbc.ReportsMapper$.mapRow(ReportsJdbcRepository.scala:438)
at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:92)
at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:60)
at org.springframework.jdbc.core.JdbcTemplate$1QueryStatementCallback.doInStatement(JdbcTemplate.java:446)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:396)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:456)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:464)
at com.normation.rudder.repository.jdbc.ReportsJdbcRepository.getErrorReportsBeetween(ReportsJdbcRepository.scala:428)
at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger$$anonfun$messageHandler$1.applyOrElse(AutomaticReportLogger.scala:130)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at net.liftweb.actor.LiftActor$class.execTranslate(LiftActor.scala:440)
at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.execTranslate(AutomaticReportLogger.scala:84)
at net.liftweb.actor.SpecializedLiftActor$class.liftedTree2$1(LiftActor.scala:288)
at net.liftweb.actor.SpecializedLiftActor$class.net$liftweb$actor$SpecializedLiftActor$$proc2(LiftActor.scala:287)
at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply$mcV$sp(LiftActor.scala:210)
at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply(LiftActor.scala:210)
at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply(LiftActor.scala:210)
at net.liftweb.actor.SpecializedLiftActor$class.around(LiftActor.scala:224)
at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.around(AutomaticReportLogger.scala:84)
at net.liftweb.actor.SpecializedLiftActor$class.net$liftweb$actor$SpecializedLiftActor$$processMailbox(LiftActor.scala:209)
at net.liftweb.actor.SpecializedLiftActor$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiftActor.scala:173)
at net.liftweb.actor.LAScheduler$$anonfun$9$$anon$2$$anon$3.run(LiftActor.scala:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
</pre>
*Workarounds*
A workaround may be to add more ram to the Rudder web app, as explained here: http://www.rudder-project.org/doc-3.2/_performance_tuning.html#_java_out_of_memory_error
It may not be sufficient in case where there is really a lot of event, so you may need a more radical workaround, as explain in comments, which is to delete the surnumerous events:
<pre>
delete from ruddersysevents where eventtype = 'result_repaired' and executiontimestamp < now()-'1 hour'::interval ;
</pre>
Of course, it's a rather irreversible workaround, so you may want to know what where the problems before:
<pre>
select nodeid,directiveid,ruleid,component,keyvalue,msg
from ruddersysevents
where eventtype = 'result_repaired' and executiontimestamp < now()-'1 hour'::interval
group by nodeid,directiveid,ruleid, component, keyvalue,msg
;
</pre>