Project

General

Profile

Bug #14465

Generation fails with "Task FutureTask rejected from ThreadPoolExecutor" due to timeout in JS computation

Added by François ARMAND 9 months ago. Updated 5 months ago.

Status:
Released
Priority:
N/A
Category:
Web - Config management
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Effort required:
Priority:
0

Description

We saw that exception during policy generation of a rather big :

⇨ Policy update error for process '754' at 2019-03-13 16:35:10
⇨ Exception caught during policy update process: Task java.util.concurrent.FutureTask@4cc89444 rejected from java.util.concurrent.ThreadPoolExecutor@7a2cae84[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 24]

https://stackoverflow.com/questions/42843427/getting-error-futuretask2c7b84de-rejected-when-using-threadpool
https://stackoverflow.com/questions/8183205/what-could-be-the-cause-of-rejectedexecutionexception

Update:
to correct that problem, we need to be able to control the max parallelism in generation globally, and for desesperate cases, we need to also be able to control JS timeout.

These two parameters are quite advanced, and we will only expose them through settings API, but not in the UI.

You can manage them with:

# for max parallelism, either use '1x' to mean "1 time the number of CPU / 2" or '3' to mean '3 threads'
curl -k -H "X-API-Token: xxx"  -X POST 'https://.../rudder/api/latest/settings/rudder_generation_max_parallelism' -d "value=1x" 
# value is in seconds
curl -k -H "X-API-Token: xxx"  -X POST 'https://.../rudder/api/latest/settings/rudder_generation_js_timeout' -d "value=10" 

Subtasks

Bug #15078: Add an option to fail or not during policy generation if NodeConfigurations cannot be createdReleasedNicolas CHARLESActions
Bug #15083: Use human-readable durations in policy generation summaryReleasedFrançois ARMANDActions
Bug #15085: When a policy generation is triggered but nothing needs to be updated a log erroneously states that all nodes failedReleasedNicolas CHARLESActions

Related issues

Related to Rudder - Bug #11594: JVM crash due to thread leak in CachedFindRuleNodeStatusReportsNewActions
Related to Rudder - Bug #14462: when javascript engine timeout, we shutdown the thread before having a chance to report about itRejectedActions
Related to Rudder - User story #10551: Make policy generation be node by nodeNewActions
Related to Rudder - Bug #14945: Missing index in ldap directory DiscussionActions

Associated revisions

Revision f4f57a7a (diff)
Added by François ARMAND 6 months ago

Fixes #14465: Exception: Task java.util.concurrent.FutureTask rejected from java.util.concurrent.ThreadPoolExecutor

History

#1

Updated by François ARMAND 9 months ago

We had one with the following exception when the webapp is shutdown - perhaps a case when something wasn't stopped correctly:

[2019-02-27 11:25:42] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
net.liftweb.actor.PingerException: StartUpdate could not be scheduled on com.normation.rudder.batch.AbstractScheduler$StatusManager@7d6c51c
        at net.liftweb.actor.LAPinger$.schedule(LAPinger.scala:75)
        at com.normation.rudder.batch.AbstractScheduler$StatusManager$$anonfun$messageHandler$1.applyOrElse(AbstractScheduler.scala:157)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
        at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504)
        at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496)
        at com.normation.rudder.batch.AbstractScheduler$StatusManager.execTranslate(AbstractScheduler.scala:105)
        at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:305)
        at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:227)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:241)
        at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:240)
        at com.normation.rudder.batch.AbstractScheduler$StatusManager.around(AbstractScheduler.scala:105)
        at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:227)
        at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:190)
        at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@25591836 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@2b9ee2ed[Shutting down, pool size = 1, active threads = 0, queued tasks = 5, completed tasks = 1117841]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:549)
        at java.util.concurrent.Executors$DelegatedScheduledExecutorService.schedule(Executors.java:732)
        at net.liftweb.actor.LAPinger$.schedule(LAPinger.scala:73)
        ... 17 common frames omitted

Or similar, without any exception (also during shutdown):

210 2019-02-16 06:35:01.628:INFO:oejs.AbstractConnector:Thread-11: Stopped ServerConnector@140a1b8d{HTTP/1.1,[http/1.1]}{127.0.0.1:8080}
211 2019-02-16 06:35:01.629:INFO:oejs.session:Thread-11: node0 Stopped scavenging
212 2019-02-16 06:35:01.661:INFO:oejshC.rudder:Thread-11: Closing Spring root WebApplicationContext
213 [2019-02-16 06:35:01] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
214 net.liftweb.actor.PingerException: com.normation.rudder.batch.StartAutomaticReporting$@6540c917 could not be scheduled on com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger@15ed    70fe
215 Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@15e3b64d rejected from java.util.concurrent.ScheduledThreadPoolEx    ecutor@11dc2029[Shutting down, pool size = 1, active threads = 0, queued tasks = 5, completed tasks = 30]
216 2019-02-16 06:35:01.819:INFO:oejsh.ContextHandler:Thread-11: Stopped o.e.j.w.WebAppContext@2a037324{/rudder,null,UNAVAILABLE}{/rudder.war}
#2

Updated by Vincent MEMBRÉ 8 months ago

  • Target version changed from 5.0.9 to 5.0.10
#3

Updated by Nicolas CHARLES 8 months ago

  • Related to Bug #11594: JVM crash due to thread leak in CachedFindRuleNodeStatusReports added
#4

Updated by François ARMAND 8 months ago

  • Related to Bug #14462: when javascript engine timeout, we shutdown the thread before having a chance to report about it added
#5

Updated by Vincent MEMBRÉ 7 months ago

  • Target version changed from 5.0.10 to 5.0.11
#6

Updated by Vincent MEMBRÉ 7 months ago

  • Target version changed from 5.0.11 to 5.0.12
#7

Updated by Alexis MOUSSET 7 months ago

  • Severity set to Major - prevents use of part of Rudder | no simple workaround

Happens on 5.0.11.

[2019-05-22 14:42:52] ERROR policy.generation - Policy update error for process '3927' at 2019-05-22 14:42:52: Exception caught during policy update process: Task java.util.concurrent.FutureTask@4e135d5c rejected from java.util.concurrent.ThreadPoolExecutor@4d198178[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2911]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@4e135d5c rejected from java.util.concurrent.ThreadPoolExecutor@4d198178[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2911]
   at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
   at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
   at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
   at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
   at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:681)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.safeExec(JavascriptEngine.scala:616)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.singleEval(JavascriptEngine.scala:582)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.$anonfun$eval$3(JavascriptEngine.scala:562)
   at com.normation.utils.Control$.$anonfun$sequence$1(Control.scala:53)
   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58)
   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at com.normation.utils.Control$.sequence(Control.scala:53)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine.eval(JavascriptEngine.scala:553)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$17(DeploymentService.scala:816)
   at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69)
   at scala.collection.Iterator.foreach(Iterator.scala:937)
   at scala.collection.Iterator.foreach$(Iterator.scala:937)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
   at scala.collection.IterableLike.foreach(IterableLike.scala:70)
   at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
   at com.normation.utils.Control$.bestEffort(Control.scala:69)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$16(DeploymentService.scala:810)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$15(DeploymentService.scala:808)
   at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69)
   at scala.collection.Iterator.foreach(Iterator.scala:937)
   at scala.collection.Iterator.foreach$(Iterator.scala:937)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
   at scala.collection.IterableLike.foreach(IterableLike.scala:70)
   at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.normation.utils.Control$.bestEffort(Control.scala:69)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$14(DeploymentService.scala:805)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$11(DeploymentService.scala:798)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$5(DeploymentService.scala:772)
   at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:69)
   at scala.collection.Iterator.foreach(Iterator.scala:937)
   at scala.collection.Iterator.foreach$(Iterator.scala:937)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
   at scala.collection.IterableLike.foreach(IterableLike.scala:70)
   at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
   at com.normation.utils.Control$.bestEffort(Control.scala:69)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.$anonfun$buildNodeConfigurations$4(DeploymentService.scala:769)
   at com.normation.rudder.services.policies.JsEngineProvider$.$anonfun$withNewEngine$1(JavascriptEngine.scala:345)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine$.$anonfun$sandboxed$1(JavascriptEngine.scala:465)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.JsEngine$SandboxedJsEngine$.sandboxed(JavascriptEngine.scala:461)
   at com.normation.rudder.services.policies.JsEngineProvider$.withNewEngine(JavascriptEngine.scala:345)
   at com.normation.rudder.services.policies.BuildNodeConfiguration$.buildNodeConfigurations(DeploymentService.scala:767)
   at com.normation.rudder.services.policies.PromiseGeneration_buildNodeConfigurations.buildNodeConfigurations(DeploymentService.scala:710)
   at com.normation.rudder.services.policies.PromiseGeneration_buildNodeConfigurations.buildNodeConfigurations$(DeploymentService.scala:703)
   at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.buildNodeConfigurations(DeploymentService.scala:474)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$60(DeploymentService.scala:217)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$56(DeploymentService.scala:211)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$51(DeploymentService.scala:205)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$48(DeploymentService.scala:193)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$44(DeploymentService.scala:181)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$42(DeploymentService.scala:180)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$40(DeploymentService.scala:179)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$39(DeploymentService.scala:178)
 at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$37(DeploymentService.scala:177)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$34(DeploymentService.scala:174)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$30(DeploymentService.scala:171)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$26(DeploymentService.scala:168)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$22(DeploymentService.scala:165)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$15(DeploymentService.scala:157)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$11(DeploymentService.scala:154)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$8(DeploymentService.scala:149)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$5(DeploymentService.scala:144)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$4(DeploymentService.scala:143)
   at net.liftweb.common.Full.flatMap(Box.scala:856)
   at com.normation.rudder.services.policies.PromiseGenerationService.deploy(DeploymentService.scala:137)
   at com.normation.rudder.services.policies.PromiseGenerationService.deploy$(DeploymentService.scala:121)
   at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.deploy(DeploymentService.scala:474)
   at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$$anonfun$messageHandler$1.applyOrElse(AsyncDeploymentActor.scala:376)
   at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
   at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:503)
   at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:495)
   at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.execTranslate(AsyncDeploymentActor.scala:366)
   at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:304)
   at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:226)
   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
   at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:240)
   at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:239)
   at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.around(AsyncDeploymentActor.scala:366)
   at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226)
   at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189)
   at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76)
  at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.around(AsyncDeploymentActor.scala:366)
   at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226)
   at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189)
   at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:748)

Settings to major as it leads to changes not being deployed when expected.

#8

Updated by Alexis MOUSSET 7 months ago

I saw it again happening just after again:

[2019-05-23 10:40:32] INFO  policy.generation - Flag file '/opt/rudder/etc/policy-update-running' successfully removed
#9

Updated by François ARMAND 7 months ago

Problem understood: when there is a timeout, the JS engine catch the error and try to stop the thread pool. We are not checking if there's other task waiting to be executed on that js threadpool.

So when one js processing took more than the 5s, and there is other js waiting for evalutation, then it raises that error in place of raising the nice error "js task timed out for ...".

See: JavascripEngine.scala ~line 625: pool.shutdownNow()

#10

Updated by François ARMAND 6 months ago

  • Related to Bug #15011: Error at the end of a policy generation with too many nodes added
#11

Updated by François ARMAND 6 months ago

#12

Updated by François ARMAND 6 months ago

Moreover the timeout can happen very easely underload, once Full GC reaches a couple of seconds (even if JS computation is short)

#13

Updated by François ARMAND 6 months ago

  • Related to Bug #14945: Missing index in ldap directory added
#14

Updated by François ARMAND 6 months ago

  • Status changed from New to In progress
#15

Updated by François ARMAND 6 months ago

  • Description updated (diff)
#16

Updated by François ARMAND 6 months ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder/pull/2260
#17

Updated by Rudder Quality Assistant 6 months ago

  • Assignee changed from Nicolas CHARLES to François ARMAND
#18

Updated by François ARMAND 6 months ago

  • Status changed from Pending technical review to Pending release
#22

Updated by Alexis MOUSSET 6 months ago

  • Subject changed from Exception: Task java.util.concurrent.FutureTask rejected from java.util.concurrent.ThreadPoolExecutor to Generation fails with "Task FutureTask rejected from ThreadPoolExecutor" due to timeout in JS computation
#23

Updated by François ARMAND 5 months ago

  • Related to deleted (Bug #15011: Error at the end of a policy generation with too many nodes )
#24

Updated by Vincent MEMBRÉ 5 months ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 5.0.12 which was released today.

Also available in: Atom PDF