Project

General

Profile

Actions

Bug #14465

closed

Generation fails with "Task FutureTask rejected from ThreadPoolExecutor" due to timeout in JS computation

Added by François ARMAND over 5 years ago. Updated over 5 years ago.

Status:
Released
Priority:
N/A
Category:
Web - Config management
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
Reviewed
Fix check:
Checked
Regression:

Description

We saw that exception during policy generation of a rather big :

⇨ Policy update error for process '754' at 2019-03-13 16:35:10
⇨ Exception caught during policy update process: Task java.util.concurrent.FutureTask@4cc89444 rejected from java.util.concurrent.ThreadPoolExecutor@7a2cae84[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 24]

https://stackoverflow.com/questions/42843427/getting-error-futuretask2c7b84de-rejected-when-using-threadpool
https://stackoverflow.com/questions/8183205/what-could-be-the-cause-of-rejectedexecutionexception

Update:
to correct that problem, we need to be able to control the max parallelism in generation globally, and for desesperate cases, we need to also be able to control JS timeout.

These two parameters are quite advanced, and we will only expose them through settings API, but not in the UI.

You can manage them with:

# for max parallelism, either use '1x' to mean "1 time the number of CPU / 2" or '3' to mean '3 threads'
curl -k -H "X-API-Token: xxx"  -X POST 'https://.../rudder/api/latest/settings/rudder_generation_max_parallelism' -d "value=1x" 
# value is in seconds
curl -k -H "X-API-Token: xxx"  -X POST 'https://.../rudder/api/latest/settings/rudder_generation_js_timeout' -d "value=10" 

Subtasks 3 (0 open3 closed)

Bug #15078: Add an option to fail or not during policy generation if NodeConfigurations cannot be createdReleasedNicolas CHARLESActions
Bug #15083: Use human-readable durations in policy generation summaryReleasedFrançois ARMANDActions
Bug #15085: When a policy generation is triggered but nothing needs to be updated a log erroneously states that all nodes failedReleasedNicolas CHARLESActions

Related issues 4 (1 open3 closed)

Related to Rudder - Bug #11594: JVM crash due to thread leak in CachedFindRuleNodeStatusReportsResolvedActions
Related to Rudder - Bug #14462: when javascript engine timeout, we shutdown the thread before having a chance to report about itRejectedActions
Related to Rudder - User story #10551: Make policy generation be node by nodeNewActions
Related to Rudder - Bug #14945: Missing index in ldap directory ResolvedFrançois ARMANDActions
Actions #1

Updated by François ARMAND over 5 years ago

We had one with the following exception when the webapp is shutdown - perhaps a case when something wasn't stopped correctly:

[2019-02-27 11:25:42] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
net.liftweb.actor.PingerException: StartUpdate could not be scheduled on com.normation.rudder.batch.AbstractScheduler$StatusManager@7d6c51c
        at net.liftweb.actor.LAPinger$.schedule(LAPinger.scala:75)
        at com.normation.rudder.batch.AbstractScheduler$StatusManager$$anonfun$messageHandler$1.applyOrElse(AbstractScheduler.scala:157)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
        at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504)
        at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496)
        at com.normation.rudder.batch.AbstractScheduler$StatusManager.execTranslate(AbstractScheduler.scala:105)
        at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:305)
        at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:227)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:241)
        at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:240)
        at com.normation.rudder.batch.AbstractScheduler$StatusManager.around(AbstractScheduler.scala:105)
        at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:227)
        at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:190)
        at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@25591836 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@2b9ee2ed[Shutting down, pool size = 1, active threads = 0, queued tasks = 5, completed tasks = 1117841]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:549)
        at java.util.concurrent.Executors$DelegatedScheduledExecutorService.schedule(Executors.java:732)
        at net.liftweb.actor.LAPinger$.schedule(LAPinger.scala:73)
        ... 17 common frames omitted

Or similar, without any exception (also during shutdown):

210 2019-02-16 06:35:01.628:INFO:oejs.AbstractConnector:Thread-11: Stopped ServerConnector@140a1b8d{HTTP/1.1,[http/1.1]}{127.0.0.1:8080}
211 2019-02-16 06:35:01.629:INFO:oejs.session:Thread-11: node0 Stopped scavenging
212 2019-02-16 06:35:01.661:INFO:oejshC.rudder:Thread-11: Closing Spring root WebApplicationContext
213 [2019-02-16 06:35:01] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
214 net.liftweb.actor.PingerException: com.normation.rudder.batch.StartAutomaticReporting$@6540c917 could not be scheduled on com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger@15ed    70fe
215 Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@15e3b64d rejected from java.util.concurrent.ScheduledThreadPoolEx    ecutor@11dc2029[Shutting down, pool size = 1, active threads = 0, queued tasks = 5, completed tasks = 30]
216 2019-02-16 06:35:01.819:INFO:oejsh.ContextHandler:Thread-11: Stopped o.e.j.w.WebAppContext@2a037324{/rudder,null,UNAVAILABLE}{/rudder.war}
Actions #2

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 5.0.9 to 5.0.10
Actions #3

Updated by Nicolas CHARLES over 5 years ago

  • Related to Bug #11594: JVM crash due to thread leak in CachedFindRuleNodeStatusReports added
Actions #4

Updated by François ARMAND over 5 years ago

  • Related to Bug #14462: when javascript engine timeout, we shutdown the thread before having a chance to report about it added
Actions #5

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 5.0.10 to 5.0.11
Actions #6

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 5.0.11 to 5.0.12
Actions #7

Updated by Alexis Mousset over 5 years ago

  • Severity set to Major - prevents use of part of Rudder | no simple workaround

Happens on 5.0.11.

[2019-05-22 14:42:52] ERROR policy.generation - Policy update error for process '3927' at 2019-05-22 14:42:52: Exception caught during policy update process: Task java.util.concurrent.FutureTask@4e135d5c rejected from java.util.concurrent.ThreadPoolExecutor@4d198178[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2911]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@4e135d5c rejected from java.util.concurrent.ThreadPoolExecutor@4d198178[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2911]
   at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
   at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
   at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
   at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
   at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:681)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.safeExec(JavascriptEngine.scala:616)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.singleEval(JavascriptEngine.scala:582)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.$anonfun$eval$3(JavascriptEngine.scala:562)
   at com.normation.utils.Control$.$anonfun$sequence$1(Control.scala:53)
   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58)
   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at com.normation.utils.Control$.sequence(Control.scala:53)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.eval(JavascriptEngine.scala:553)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$17(DeploymentService.scala:816)
   at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69)
   at scala.collection.Iterator.foreach(Iterator.scala:937)
   at scala.collection.Iterator.foreach$(Iterator.scala:937)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
   at scala.collection.IterableLike.foreach(IterableLike.scala:70)
   at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
   at com.normation.utils.Control$.bestEffort(Control.scala:69)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$16(DeploymentService.scala:810)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$15(DeploymentService.scala:808)
   at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69)
   at scala.collection.Iterator.foreach(Iterator.scala:937)
   at scala.collection.Iterator.foreach$(Iterator.scala:937)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
   at scala.collection.IterableLike.foreach(IterableLike.scala:70)
   at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.normation.utils.Control$.bestEffort(Control.scala:69)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$14(DeploymentService.scala:805)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$11(DeploymentService.scala:798)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$5(DeploymentService.scala:772)
   at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69)
   at scala.collection.Iterator.foreach(Iterator.scala:937)
   at scala.collection.Iterator.foreach$(Iterator.scala:937)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
   at scala.collection.IterableLike.foreach(IterableLike.scala:70)
   at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
   at com.normation.utils.Control$.bestEffort(Control.scala:69)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$4(DeploymentService.scala:769)
   at com.normation.rudder.services.policies.JsEngineProvider$.$anonfun$withNewEngine$1(JavascriptEngine.scala:345)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine$.$anonfun$sandboxed$1(JavascriptEngine.scala:465)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine$.sandboxed(JavascriptEngine.scala:461)
   at com.normation.rudder.services.policies.JsEngineProvider$.withNewEngine(JavascriptEngine.scala:345)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.buildNodeConfigurations(DeploymentService.scala:767)
   at com.normation.rudder.services.policies.PromiseGeneration_buildNodeConfigurations.buildNodeConfigurations(DeploymentService.scala:710)
   at com.normation.rudder.services.policies.PromiseGeneration_buildNodeConfigurations.buildNodeConfigurations$(DeploymentService.scala:703)
   at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.buildNodeConfigurations(DeploymentService.scala:474)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$60(DeploymentService.scala:217)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$56(DeploymentService.scala:211)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$51(DeploymentService.scala:205)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$48(DeploymentService.scala:193)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$44(DeploymentService.scala:181)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$42(DeploymentService.scala:180)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$40(DeploymentService.scala:179)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$39(DeploymentService.scala:178)
 at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$37(DeploymentService.scala:177)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$34(DeploymentService.scala:174)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$30(DeploymentService.scala:171)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$26(DeploymentService.scala:168)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$22(DeploymentService.scala:165)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$15(DeploymentService.scala:157)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$11(DeploymentService.scala:154)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$8(DeploymentService.scala:149)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$5(DeploymentService.scala:144)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$4(DeploymentService.scala:143)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.deploy(DeploymentService.scala:137)
   at com.normation.rudder.services.policies.PromiseGenerationService.deploy$(DeploymentService.scala:121)
   at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.deploy(DeploymentService.scala:474)
   at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$$anonfun$messageHandler$1.applyOrElse(AsyncDeploymentActor.scala:376)
   at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
   at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:503)
   at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:495)
   at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.execTranslate(AsyncDeploymentActor.scala:366)
   at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:304)
   at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:226)
   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
   at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:240)
   at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:239)
   at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.around(AsyncDeploymentActor.scala:366)
   at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226)
   at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189)
   at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76)
  at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.around(AsyncDeploymentActor.scala:366)
   at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226)
   at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189)
   at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:748)

Settings to major as it leads to changes not being deployed when expected.

Actions #8

Updated by Alexis Mousset over 5 years ago

I saw it again happening just after again:

[2019-05-23 10:40:32] INFO  policy.generation - Flag file '/opt/rudder/etc/policy-update-running' successfully removed
Actions #9

Updated by François ARMAND over 5 years ago

Problem understood: when there is a timeout, the JS engine catch the error and try to stop the thread pool. We are not checking if there's other task waiting to be executed on that js threadpool.

So when one js processing took more than the 5s, and there is other js waiting for evalutation, then it raises that error in place of raising the nice error "js task timed out for ...".

See: JavascripEngine.scala ~line 625: pool.shutdownNow()

Actions #10

Updated by François ARMAND over 5 years ago

  • Related to Bug #15011: Error at the end of a policy generation with too many nodes added
Actions #11

Updated by François ARMAND over 5 years ago

Actions #12

Updated by François ARMAND over 5 years ago

Moreover the timeout can happen very easely underload, once Full GC reaches a couple of seconds (even if JS computation is short)

Actions #13

Updated by François ARMAND over 5 years ago

  • Related to Bug #14945: Missing index in ldap directory added
Actions #14

Updated by François ARMAND over 5 years ago

  • Status changed from New to In progress
Actions #15

Updated by François ARMAND over 5 years ago

  • Description updated (diff)
Actions #16

Updated by François ARMAND over 5 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder/pull/2260
Actions #17

Updated by Rudder Quality Assistant over 5 years ago

  • Assignee changed from Nicolas CHARLES to François ARMAND
Actions #18

Updated by François ARMAND over 5 years ago

  • Status changed from Pending technical review to Pending release
Actions #19

Updated by Vincent MEMBRÉ over 5 years ago

  • Name check set to To do
Actions #20

Updated by Vincent MEMBRÉ over 5 years ago

  • Fix check set to To do
Actions #21

Updated by François ARMAND over 5 years ago

  • Fix check changed from To do to Checked
Actions #22

Updated by Alexis Mousset over 5 years ago

  • Subject changed from Exception: Task java.util.concurrent.FutureTask rejected from java.util.concurrent.ThreadPoolExecutor to Generation fails with "Task FutureTask rejected from ThreadPoolExecutor" due to timeout in JS computation
  • Name check changed from To do to Reviewed
Actions #23

Updated by François ARMAND over 5 years ago

  • Related to deleted (Bug #15011: Error at the end of a policy generation with too many nodes )
Actions #24

Updated by Vincent MEMBRÉ over 5 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 5.0.12 which was released today.

Actions

Also available in: Atom PDF