Actions
User story #5617
closedDetecting and restarting Rudder on OOM (Out Of Memory Exception)
Pull Request:
UX impact:
Suggestion strength:
User visibility:
Effort required:
Name check:
Fix check:
Regression:
Description
Hi,
I've a lot of nodes (5000 "test" nodes). The policy generation following their acceptation has been ongoing since yesterday 21:28 (for 12 hours now). This is not normal, it should not have been longer than 4 hours.
In the webapp log, I can see the following exception:
[2014-10-07 21:26:35] INFO com.normation.rudder.services.policies.DeploymentServiceImpl - Start policy generation, checking updated rules [2014-10-07 21:28:28] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 74577 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists. [2014-10-07 21:28:29] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception java.lang.OutOfMemoryError: Java heap space at com.unboundid.util.StaticUtils.toLowerCase(StaticUtils.java:440) ~[unboundid-ldapsdk-2.3.4.jar:2.3.4] Exception in thread "pool-3-thread-8" java.lang.OutOfMemoryError: Java heap space at com.unboundid.util.StaticUtils.toLowerCase(StaticUtils.java:440) at com.unboundid.ldap.sdk.Entry.<init>(Entry.java:309) at com.unboundid.ldap.sdk.Entry.<init>(Entry.java:284) at com.normation.ldap.sdk.LDAPEntry$.apply(LDAPEntry.scala:291) at com.normation.ldap.sdk.LDAPEntry$.apply(LDAPEntry.scala:293) at com.normation.ldap.sdk.RoLDAPConnection$$anonfun$search$1.apply(LDAPConnection.scala:303) at com.normation.ldap.sdk.RoLDAPConnection$$anonfun$search$1.apply(LDAPConnection.scala:303) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at com.normation.ldap.sdk.RoLDAPConnection.search(LDAPConnection.scala:303) at com.normation.ldap.sdk.ReadOnlyEntryLDAPConnection$class.search(LDAPConnection.scala:82) at com.normation.ldap.sdk.RoLDAPConnection.search(LDAPConnection.scala:283) at com.normation.ldap.sdk.ReadOnlyEntryLDAPConnection$class.searchOne(LDAPConnection.scala:142) at com.normation.ldap.sdk.RoLDAPConnection.searchOne(LDAPConnection.scala:283) at com.normation.rudder.services.nodes.NodeInfoServiceImpl$$anonfun$getAll$1.apply(NodeInfoService.scala:193) at com.normation.rudder.services.nodes.NodeInfoServiceImpl$$anonfun$getAll$1.apply(NodeInfoService.scala:189) at com.normation.ldap.sdk.LDAPConnectionProvider$$anonfun$map$1.apply(LDAPConnectionProvider.scala:94) at com.normation.ldap.sdk.LDAPConnectionProvider$$anonfun$map$1.apply(LDAPConnectionProvider.scala:93) at com.normation.ldap.sdk.LDAPConnectionProvider$class.withCon(LDAPConnectionProvider.scala:154) at com.normation.ldap.sdk.ROPooledSimpleAuthConnectionProvider.withCon(LDAPConnectionProvider.scala:369) at com.normation.ldap.sdk.LDAPConnectionProvider$class.map(LDAPConnectionProvider.scala:93) at com.normation.ldap.sdk.ROPooledSimpleAuthConnectionProvider.map(LDAPConnectionProvider.scala:369) at com.normation.rudder.services.nodes.NodeInfoServiceImpl.getAll(NodeInfoService.scala:189) at com.normation.rudder.services.policies.DeploymentService_findDependantRules_bruteForce$class.getAllNodeInfos(DeploymentService.scala:322) at com.normation.rudder.services.policies.DeploymentServiceImpl.getAllNodeInfos(DeploymentService.scala:276) at com.normation.rudder.services.policies.DeploymentService$$anonfun$2.apply(DeploymentService.scala:90) [2014-10-07 21:28:30] INFO com.normation.rudder.batch.AsyncDeploymentAgent - One automatic policy update process is already pending, ignoring new policy update request
Looks like it actually failed two minutes after the beginning, but rudder is stuck thinking it's still ongoing:
Updating policies (started at 2014-10-07 21:26). Another update is pending since 2014-10-07 21:26
Thanks.
Actions