Bug #7735: OOM in Rudder when there are too many repaired reports - Rudder - Issue Tracker

Actions

Copy link

Bug #7735

closed

OOM in Rudder when there are too many repaired reports

Added by Nicolas CHARLES over 9 years ago. Updated about 3 years ago.

Status:

Released

Priority:

1 (highest)

Assignee:

François ARMAND

Category:

Web - Compliance & node report

Target version:

3.0.16

Pull Request:

https://github.com/Normation/rudder/p...

Severity:

UX impact:

User visibility:

Effort required:

Priority:

Name check:

Fix check:

Regression:

Description

If there are too many repaired reports in the database, the Rudder web interface requires a lot more memory, and can lead to OOM

There can be a lot of repaired reports (for instance if you use a lot of command_execution in ncf technique editor), and witha lot of nodes, it can quicly add up to 2 millions entries (runs every 5 minutes, 10 repairs per run, 300 nodes -> 2.5 millions repaired entries)

an output of the OOM is the following, but it may be really anything at all

ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
java.lang.OutOfMemoryError: Java heap space
Exception in thread "Connection reader for connection 1 to localhost:389" java.lang.OutOfMemoryError: Java heap space
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    at com.unboundid.asn1.ASN1StreamReader.read(ASN1StreamReader.java:978)
    at com.unboundid.asn1.ASN1StreamReader.readType(ASN1StreamReader.java:327)
    at com.unboundid.asn1.ASN1StreamReader.beginSequence(ASN1StreamReader.java:900)
    at com.unboundid.ldap.protocol.LDAPMessage.readLDAPResponseFrom(LDAPMessage.java:1146)
    at com.unboundid.ldap.sdk.LDAPConnectionReader.run(LDAPConnectionReader.java:257)
Exception in thread "pool-2-thread-5" java.lang.OutOfMemoryError: Java heap space
    at org.postgresql.jdbc2.TimestampUtils.loadCalendar(TimestampUtils.java:101)
    at org.postgresql.jdbc2.TimestampUtils.toTimestamp(TimestampUtils.java:333)
    at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getTimestamp(AbstractJdbc2ResultSet.java:540)
    at org.postgresql.jdbc2.AbstractJdbc2ResultSet.getTimestamp(AbstractJdbc2ResultSet.java:2629)
    at com.normation.rudder.repository.jdbc.ReportsMapper$.mapRow(ReportsJdbcRepository.scala:448)
    at com.normation.rudder.repository.jdbc.ReportsMapper$.mapRow(ReportsJdbcRepository.scala:438)
    at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:92)
    at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:60)
    at org.springframework.jdbc.core.JdbcTemplate$1QueryStatementCallback.doInStatement(JdbcTemplate.java:446)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:396)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:456)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:464)
    at com.normation.rudder.repository.jdbc.ReportsJdbcRepository.getErrorReportsBeetween(ReportsJdbcRepository.scala:428)
    at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger$$anonfun$messageHandler$1.applyOrElse(AutomaticReportLogger.scala:130)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    at net.liftweb.actor.LiftActor$class.execTranslate(LiftActor.scala:440)
    at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.execTranslate(AutomaticReportLogger.scala:84)
    at net.liftweb.actor.SpecializedLiftActor$class.liftedTree2$1(LiftActor.scala:288)
    at net.liftweb.actor.SpecializedLiftActor$class.net$liftweb$actor$SpecializedLiftActor$$proc2(LiftActor.scala:287)
    at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply$mcV$sp(LiftActor.scala:210)
    at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply(LiftActor.scala:210)
    at net.liftweb.actor.SpecializedLiftActor$$anonfun$net$liftweb$actor$SpecializedLiftActor$$processMailbox$1.apply(LiftActor.scala:210)
    at net.liftweb.actor.SpecializedLiftActor$class.around(LiftActor.scala:224)
    at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.around(AutomaticReportLogger.scala:84)
    at net.liftweb.actor.SpecializedLiftActor$class.net$liftweb$actor$SpecializedLiftActor$$processMailbox(LiftActor.scala:209)
    at net.liftweb.actor.SpecializedLiftActor$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiftActor.scala:173)
    at net.liftweb.actor.LAScheduler$$anonfun$9$$anon$2$$anon$3.run(LiftActor.scala:64)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Workarounds

A workaround may be to add more ram to the Rudder web app, as explained here: http://www.rudder-project.org/doc-3.2/_performance_tuning.html#_java_out_of_memory_error

It may not be sufficient in case where there is really a lot of event, so you may need a more radical workaround, as explain in comments, which is to delete the surnumerous events:

delete from ruddersysevents where eventtype = 'result_repaired' and executiontimestamp < now()-'1 hour'::interval ;

Of course, it's a rather irreversible workaround, so you may want to know what where the problems before:

select nodeid,directiveid,ruleid,component,keyvalue,msg
  from ruddersysevents
  where eventtype = 'result_repaired' and executiontimestamp < now()-'1 hour'::interval
  group by nodeid,directiveid,ruleid, component, keyvalue,msg
;

Subtasks 1 (0 open — 1 closed)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Rudder

Custom queries

Bug #7735

OOM in Rudder when there are too many repaired reports

Updated by Nicolas CHARLES over 9 years ago

Updated by Vincent MEMBRÉ over 9 years ago

Updated by Vincent MEMBRÉ over 9 years ago

Updated by Nicolas CHARLES about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by Vincent MEMBRÉ about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by Nicolas CHARLES about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by Jonathan CLARKE about 9 years ago

Updated by François ARMAND about 9 years ago

Updated by Vincent MEMBRÉ about 9 years ago

Updated by Alexis Mousset about 3 years ago