Project

General

Profile

Bug #7735

Updated by François ARMAND about 8 years ago

If there are too many repaired reports in the database, the Rudder web interface requires a lot more memory, and can lead to OOM 

 There can be a lot of repaired reports (for instance if you use a lot of command_execution in ncf technique editor), and witha lot of nodes, it can quicly add up to 2 millions entries (runs every 5 minutes, 10 repairs per run, 300 nodes -> 2.5 millions repaired entries) 

 an output of the OOM is the following, but it may be really anything at all 
 <pre> 
 ERROR net.liftweb.actor.ActorLogger - Actor threw an exception 
 java.lang.OutOfMemoryError: Java heap space 
 Exception in thread "Connection reader for connection 1 to localhost:389" java.lang.OutOfMemoryError: Java heap space 
	 at java.net.SocketInputStream.socketRead0(Native Method) 
	 at java.net.SocketInputStream.read(SocketInputStream.java:152) 
	 at java.net.SocketInputStream.read(SocketInputStream.java:122) 
	 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) 
	 at java.io.BufferedInputStream.read(BufferedInputStream.java:254) 
	 at com.unboundid.asn1.ASN1StreamReader.read(ASN1StreamReader.java:978) 
	 at com.unboundid.asn1.ASN1StreamReader.readType(ASN1StreamReader.java:327) 
	 at com.unboundid.asn1.ASN1StreamReader.beginSequence(ASN1StreamReader.java:900) 
	 at com.unboundid.ldap.protocol.LDAPMessage.readLDAPResponseFrom(LDAPMessage.java:1146) 
	 at com.unboundid.ldap.sdk.LDAPConnectionReader.run(LDAPConnectionReader.java:257) 
 Exception in thread "pool-2-thread-5" java.lang.OutOfMemoryError: Java heap space 
	 at org.postgresql.jdbc2.TimestampUtils.loadCalendar(TimestampUtils.java:101) 
	 at org.postgresql.jdbc2.TimestampUtils.toTimestamp(TimestampUtils.java:333) 
	 at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getTimestamp(AbstractJdbc2ResultSet.java:540) 
	 at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getTimestamp(AbstractJdbc2ResultSet.java:2629) 
	 at com.normation.rudder.repository.jdbc.ReportsMapper$.mapRow(ReportsJdbcRepository.scala:448) 
	 at com.normation.rudder.repository.jdbc.ReportsMapper$.mapRow(ReportsJdbcRepository.scala:438) 
	 at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:92) 
	 at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:60) 
	 at org.springframework.jdbc.core.JdbcTemplate$1QueryStatementCallback.doInStatement(JdbcTemplate.java:446) 
	 at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:396) 
	 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:456) 
	 at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:464) 
	 at com.normation.rudder.repository.jdbc.ReportsJdbcRepository.getErrorReportsBeetween(ReportsJdbcRepository.scala:428) 
	 at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger$$anonfun$messageHandler$1.applyOrElse(AutomaticReportLogger.scala:130) 
	 at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) 
	 at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) 
	 at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) 
	 at net.liftweb.actor.LiftActor$class.execTranslate(LiftActor.scala:440) 
	 at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.execTranslate(AutomaticReportLogger.scala:84) 
	 at net.liftweb.actor.SpecializedLiftActor$class.liftedTree2$1(LiftActor.scala:288) 
	 at net.liftweb.actor.SpecializedLiftActor$class.net$liftweb$actor$SpecializedLiftActor$$proc2(LiftActor.scala:287) 
	 at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply$mcV$sp(LiftActor.scala:210) 
	 at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply(LiftActor.scala:210) 
	 at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply(LiftActor.scala:210) 
	 at net.liftweb.actor.SpecializedLiftActor$class.around(LiftActor.scala:224) 
	 at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.around(AutomaticReportLogger.scala:84) 
	 at net.liftweb.actor.SpecializedLiftActor$class.net$liftweb$actor$SpecializedLiftActor$$processMailbox(LiftActor.scala:209) 
	 at net.liftweb.actor.SpecializedLiftActor$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiftActor.scala:173) 
	 at net.liftweb.actor.LAScheduler$$anonfun$9$$anon$2$$anon$3.run(LiftActor.scala:64) 
	 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
	 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
	 at java.lang.Thread.run(Thread.java:745) 
 </pre> 


 *Workarounds* 


 A workaround may be to add more ram to the Rudder web app, as explained here: http://www.rudder-project.org/doc-3.2/_performance_tuning.html#_java_out_of_memory_error 

 It may not be sufficient in case where there is really a lot of event, so you may need a more radical workaround, as explain in comments, which is to delete the surnumerous events: 

 <pre> 
 delete from ruddersysevents where eventtype = 'result_repaired' and executiontimestamp < now()-'1 hour'::interval ; 
 </pre> 

 Of course, it's a rather irreversible workaround, so you may want to know what where the problems before:  

 <pre> 
 select nodeid,directiveid,ruleid,component,keyvalue,msg 
   from ruddersysevents 
   where eventtype = 'result_repaired' and executiontimestamp < now()-'1 hour'::interval 
   group by nodeid,directiveid,ruleid, component, keyvalue,msg 
 ; 
 </pre> 

Back