Bug #14465
closedGeneration fails with "Task FutureTask rejected from ThreadPoolExecutor" due to timeout in JS computation
Description
We saw that exception during policy generation of a rather big :
⇨ Policy update error for process '754' at 2019-03-13 16:35:10 ⇨ Exception caught during policy update process: Task java.util.concurrent.FutureTask@4cc89444 rejected from java.util.concurrent.ThreadPoolExecutor@7a2cae84[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 24]
https://stackoverflow.com/questions/42843427/getting-error-futuretask2c7b84de-rejected-when-using-threadpool
https://stackoverflow.com/questions/8183205/what-could-be-the-cause-of-rejectedexecutionexception
Update:
to correct that problem, we need to be able to control the max parallelism in generation globally, and for desesperate cases, we need to also be able to control JS timeout.
These two parameters are quite advanced, and we will only expose them through settings
API, but not in the UI.
You can manage them with:
# for max parallelism, either use '1x' to mean "1 time the number of CPU / 2" or '3' to mean '3 threads' curl -k -H "X-API-Token: xxx" -X POST 'https://.../rudder/api/latest/settings/rudder_generation_max_parallelism' -d "value=1x" # value is in seconds curl -k -H "X-API-Token: xxx" -X POST 'https://.../rudder/api/latest/settings/rudder_generation_js_timeout' -d "value=10"
Updated by François ARMAND over 5 years ago
We had one with the following exception when the webapp is shutdown - perhaps a case when something wasn't stopped correctly:
[2019-02-27 11:25:42] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception net.liftweb.actor.PingerException: StartUpdate could not be scheduled on com.normation.rudder.batch.AbstractScheduler$StatusManager@7d6c51c at net.liftweb.actor.LAPinger$.schedule(LAPinger.scala:75) at com.normation.rudder.batch.AbstractScheduler$StatusManager$$anonfun$messageHandler$1.applyOrElse(AbstractScheduler.scala:157) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34) at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504) at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496) at com.normation.rudder.batch.AbstractScheduler$StatusManager.execTranslate(AbstractScheduler.scala:105) at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:305) at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:227) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:241) at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:240) at com.normation.rudder.batch.AbstractScheduler$StatusManager.around(AbstractScheduler.scala:105) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:227) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:190) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@25591836 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@2b9ee2ed[Shutting down, pool size = 1, active threads = 0, queued tasks = 5, completed tasks = 1117841] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:549) at java.util.concurrent.Executors$DelegatedScheduledExecutorService.schedule(Executors.java:732) at net.liftweb.actor.LAPinger$.schedule(LAPinger.scala:73) ... 17 common frames omitted
Or similar, without any exception (also during shutdown):
210 2019-02-16 06:35:01.628:INFO:oejs.AbstractConnector:Thread-11: Stopped ServerConnector@140a1b8d{HTTP/1.1,[http/1.1]}{127.0.0.1:8080} 211 2019-02-16 06:35:01.629:INFO:oejs.session:Thread-11: node0 Stopped scavenging 212 2019-02-16 06:35:01.661:INFO:oejshC.rudder:Thread-11: Closing Spring root WebApplicationContext 213 [2019-02-16 06:35:01] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception 214 net.liftweb.actor.PingerException: com.normation.rudder.batch.StartAutomaticReporting$@6540c917 could not be scheduled on com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger@15ed 70fe 215 Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@15e3b64d rejected from java.util.concurrent.ScheduledThreadPoolEx ecutor@11dc2029[Shutting down, pool size = 1, active threads = 0, queued tasks = 5, completed tasks = 30] 216 2019-02-16 06:35:01.819:INFO:oejsh.ContextHandler:Thread-11: Stopped o.e.j.w.WebAppContext@2a037324{/rudder,null,UNAVAILABLE}{/rudder.war}
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 5.0.9 to 5.0.10
Updated by Nicolas CHARLES over 5 years ago
- Related to Bug #11594: JVM crash due to thread leak in CachedFindRuleNodeStatusReports added
Updated by François ARMAND over 5 years ago
- Related to Bug #14462: when javascript engine timeout, we shutdown the thread before having a chance to report about it added
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 5.0.10 to 5.0.11
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 5.0.11 to 5.0.12
Updated by Alexis Mousset over 5 years ago
- Severity set to Major - prevents use of part of Rudder | no simple workaround
Happens on 5.0.11.
[2019-05-22 14:42:52] ERROR policy.generation - Policy update error for process '3927' at 2019-05-22 14:42:52: Exception caught during policy update process: Task java.util.concurrent.FutureTask@4e135d5c rejected from java.util.concurrent.ThreadPoolExecutor@4d198178[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2911] java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@4e135d5c rejected from java.util.concurrent.ThreadPoolExecutor@4d198178[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2911] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:681) at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.safeExec(JavascriptEngine.scala:616) at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.singleEval(JavascriptEngine.scala:582) at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.$anonfun$eval$3(JavascriptEngine.scala:562) at com.normation.utils.Control$.$anonfun$sequence$1(Control.scala:53) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at com.normation.utils.Control$.sequence(Control.scala:53) at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.eval(JavascriptEngine.scala:553) at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$17(DeploymentService.scala:816) at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69) at scala.collection.Iterator.foreach(Iterator.scala:937) at scala.collection.Iterator.foreach$(Iterator.scala:937) at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at scala.collection.IterableLike.foreach(IterableLike.scala:70) at scala.collection.IterableLike.foreach$(IterableLike.scala:69) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at com.normation.utils.Control$.bestEffort(Control.scala:69) at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$16(DeploymentService.scala:810) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$15(DeploymentService.scala:808) at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69) at scala.collection.Iterator.foreach(Iterator.scala:937) at scala.collection.Iterator.foreach$(Iterator.scala:937) at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at scala.collection.IterableLike.foreach(IterableLike.scala:70) at scala.collection.IterableLike.foreach$(IterableLike.scala:69) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at com.normation.utils.Control$.bestEffort(Control.scala:69) at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$14(DeploymentService.scala:805) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$11(DeploymentService.scala:798) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$5(DeploymentService.scala:772) at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69) at scala.collection.Iterator.foreach(Iterator.scala:937) at scala.collection.Iterator.foreach$(Iterator.scala:937) at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at scala.collection.IterableLike.foreach(IterableLike.scala:70) at scala.collection.IterableLike.foreach$(IterableLike.scala:69) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at com.normation.utils.Control$.bestEffort(Control.scala:69) at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$4(DeploymentService.scala:769) at com.normation.rudder.services.policies.JsEngineProvider$.$anonfun$withNewEngine$1(JavascriptEngine.scala:345) at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine$.$anonfun$sandboxed$1(JavascriptEngine.scala:465) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine$.sandboxed(JavascriptEngine.scala:461) at com.normation.rudder.services.policies.JsEngineProvider$.withNewEngine(JavascriptEngine.scala:345) at com.normation.rudder.services.policies.BuildNodeConfiguration$.buildNodeConfigurations(DeploymentService.scala:767) at com.normation.rudder.services.policies.PromiseGeneration_buildNodeConfigurations.buildNodeConfigurations(DeploymentService.scala:710) at com.normation.rudder.services.policies.PromiseGeneration_buildNodeConfigurations.buildNodeConfigurations$(DeploymentService.scala:703) at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.buildNodeConfigurations(DeploymentService.scala:474) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$60(DeploymentService.scala:217) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$56(DeploymentService.scala:211) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$51(DeploymentService.scala:205) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$48(DeploymentService.scala:193) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$44(DeploymentService.scala:181) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$42(DeploymentService.scala:180) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$40(DeploymentService.scala:179) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$39(DeploymentService.scala:178) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$37(DeploymentService.scala:177) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$34(DeploymentService.scala:174) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$30(DeploymentService.scala:171) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$26(DeploymentService.scala:168) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$22(DeploymentService.scala:165) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$15(DeploymentService.scala:157) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$11(DeploymentService.scala:154) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$8(DeploymentService.scala:149) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$5(DeploymentService.scala:144) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$4(DeploymentService.scala:143) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.deploy(DeploymentService.scala:137) at com.normation.rudder.services.policies.PromiseGenerationService.deploy$(DeploymentService.scala:121) at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.deploy(DeploymentService.scala:474) at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$$anonfun$messageHandler$1.applyOrElse(AsyncDeploymentActor.scala:376) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34) at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:503) at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:495) at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.execTranslate(AsyncDeploymentActor.scala:366) at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:304) at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:226) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:240) at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:239) at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.around(AsyncDeploymentActor.scala:366) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.around(AsyncDeploymentActor.scala:366) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
Settings to major as it leads to changes not being deployed when expected.
Updated by Alexis Mousset over 5 years ago
I saw it again happening just after again:
[2019-05-23 10:40:32] INFO policy.generation - Flag file '/opt/rudder/etc/policy-update-running' successfully removed
Updated by François ARMAND over 5 years ago
Problem understood: when there is a timeout, the JS engine catch the error and try to stop the thread pool. We are not checking if there's other task waiting to be executed on that js threadpool.
So when one js processing took more than the 5s, and there is other js waiting for evalutation, then it raises that error in place of raising the nice error "js task timed out for ...".
See: JavascripEngine.scala
~line 625: pool.shutdownNow()
Updated by François ARMAND over 5 years ago
- Related to Bug #15011: Error at the end of a policy generation with too many nodes added
Updated by François ARMAND over 5 years ago
- Related to User story #10551: Make policy generation be node by node added
Updated by François ARMAND over 5 years ago
Moreover the timeout can happen very easely underload, once Full GC reaches a couple of seconds (even if JS computation is short)
Updated by François ARMAND over 5 years ago
- Related to Bug #14945: Missing index in ldap directory added
Updated by François ARMAND over 5 years ago
- Status changed from New to In progress
Updated by François ARMAND over 5 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/2260
Updated by Rudder Quality Assistant over 5 years ago
- Assignee changed from Nicolas CHARLES to François ARMAND
Updated by François ARMAND over 5 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|f4f57a7ab1deebc687f12941ecbb4aeb44cdba73.
Updated by François ARMAND over 5 years ago
- Fix check changed from To do to Checked
Updated by Alexis Mousset over 5 years ago
- Subject changed from Exception: Task java.util.concurrent.FutureTask rejected from java.util.concurrent.ThreadPoolExecutor to Generation fails with "Task FutureTask rejected from ThreadPoolExecutor" due to timeout in JS computation
- Name check changed from To do to Reviewed
Updated by François ARMAND over 5 years ago
- Related to deleted (Bug #15011: Error at the end of a policy generation with too many nodes )
Updated by Vincent MEMBRÉ over 5 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 5.0.12 which was released today.