Bug #16291
closedDeadlock when application starts
Description
We have a deadlock at boot.
It seems that the problem from #15982 happens again. Attached thread dump when the deadlock happens.
Problematic interaction is only if a REST request is done in parallel to start
Files
Updated by François ARMAND almost 5 years ago
- Related to Bug #15982: Webapp does not start when only one proc is available (deadlock) added
Updated by François ARMAND almost 5 years ago
More info: on the second run, it happens even whithout a rest request incoming. See attached thread dump.
Updated by François ARMAND almost 5 years ago
It seems that in both cases, we have these two blocked thread:
- one in GitArchiverUtils:
zio-default-async-5-1389774257 priority:5 - threadId:0x00007f2c7c038000 - nativeId:0x147d - nativeId (decimal):5245 - state:WAITING stackTrace: java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at zio.internal.OneShot.get(OneShot.scala:79) - locked <0x00000000c4b75500> (a zio.internal.OneShot) at zio.Runtime.unsafeRunSync(Runtime.scala:70) at zio.Runtime.unsafeRunSync$(Runtime.scala:65) at com.normation.zio$ZioRuntime$$anon$2.unsafeRunSync(ZioCommons.scala:313) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:327) at com.normation.rudder.repository.xml.GitArchiverUtils.getRootDirectory(GitArchiverUtils.scala:108) at com.normation.rudder.repository.xml.GitArchiverUtils.getRootDirectory$(GitArchiverUtils.scala:106) at com.normation.rudder.repository.xml.GitRuleArchiverImpl.getRootDirectory$lzycompute(GitArchiverImpl.scala:68) - locked <0x00000000c6434380> (a com.normation.rudder.repository.xml.GitRuleArchiverImpl) at com.normation.rudder.repository.xml.GitRuleArchiverImpl.getRootDirectory(GitArchiverImpl.scala:68) at com.normation.rudder.repository.xml.ItemArchiveManagerImpl.$anonfun$exportRules$3(ItemArchiveManagerImpl.scala:158) at com.normation.rudder.repository.xml.ItemArchiveManagerImpl$$Lambda$2718/1520790608.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at zio.blocking.Blocking$Service.$anonfun$effectBlocking$5(Blocking.scala:134) at zio.blocking.Blocking$Service$$Lambda$234/468393264.apply(Unknown Source) at zio.internal.FiberContext.evaluateNow(FiberContext.scala:386) at zio.internal.FiberContext.$anonfun$fork$2(FiberContext.scala:655) at zio.internal.FiberContext$$Lambda$242/1947181350.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
- one in CheckInitXmlExport:
zio-default-async-4-1389774257 priority:5 - threadId:0x00007f2c74202800 - nativeId:0x1476 - nativeId (decimal):5238 - state:WAITING stackTrace: java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at zio.internal.OneShot.get(OneShot.scala:79) - locked <0x00000000c4b778d0> (a zio.internal.OneShot) at zio.Runtime.unsafeRunSync(Runtime.scala:70) at zio.Runtime.unsafeRunSync$(Runtime.scala:65) at com.normation.zio$ZioRuntime$$anon$2.unsafeRunSync(ZioCommons.scala:313) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:327) at com.normation.box$IOToBox.toBox(ZioCommons.scala:376) at bootstrap.liftweb.checks.CheckInitXmlExport.checks(CheckInitXmlExport.scala:72) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1(BootstrapChecks.scala:99) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1$adapted(BootstrapChecks.scala:91) at bootstrap.liftweb.SequentialImmediateBootStrapChecks$$Lambda$1698/1357529035.apply(Unknown Source) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.checks(BootstrapChecks.scala:91) at bootstrap.liftweb.RudderConfig$.$anonfun$init$2(RudderConfig.scala:1074) at bootstrap.liftweb.RudderConfig$$$Lambda$1632/1651251663.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at zio.blocking.Blocking$Service.$anonfun$effectBlocking$5(Blocking.scala:134) at zio.blocking.Blocking$Service$$Lambda$234/468393264.apply(Unknown Source) at zio.internal.FiberContext.evaluateNow(FiberContext.scala:386) at zio.internal.FiberContext.$anonfun$fork$2(FiberContext.scala:655) at zio.internal.FiberContext$$Lambda$242/1947181350.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
And one other from Lift thread pool from lift actor, also waiting on a RunSync, like:
pool-4-thread-2 priority:5 - threadId:0x000055c3a10ce800 - nativeId:0x38c9 - nativeId (decimal):14537 - state:WAITING stackTrace: java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000c5f89d08> (a zio.internal.OneShot) at java.lang.Object.wait(Object.java:502) at zio.internal.OneShot.get(OneShot.scala:79) - locked <0x00000000c5f89d08> (a zio.internal.OneShot) at zio.Runtime.unsafeRunSync(Runtime.scala:70) at zio.Runtime.unsafeRunSync$(Runtime.scala:65) at com.normation.zio$ZioRuntime$$anon$2.unsafeRunSync(ZioCommons.scala:313) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:327) at com.normation.box$IOToBox.toBox(ZioCommons.scala:376) at com.normation.rudder.services.nodes.NodeInfoServiceCached.getLDAPNodeInfo(NodeInfoService.scala:546) at com.normation.rudder.services.nodes.NodeInfoServiceCached.getLDAPNodeInfo$(NodeInfoService.scala:528) at com.normation.rudder.services.nodes.NodeInfoServiceCachedImpl.getLDAPNodeInfo(NodeInfoService.scala:601) at com.normation.rudder.services.queries.AcceptedNodesLDAPQueryProcessor.$anonfun$queryAndChekNodeId$3(LdapQueryProcessor.scala:179) at com.normation.rudder.services.queries.AcceptedNodesLDAPQueryProcessor$$Lambda$1507/1678631731.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.queries.AcceptedNodesLDAPQueryProcessor.queryAndChekNodeId(LdapQueryProcessor.scala:176) at com.normation.rudder.services.queries.AcceptedNodesLDAPQueryProcessor.processOnlyId(LdapQueryProcessor.scala:231) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.$anonfun$computeDynGroup$4(DynGroupUpdaterService.scala:100) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl$$Lambda$1349/362963019.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.$anonfun$computeDynGroup$2(DynGroupUpdaterService.scala:99) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl$$Lambda$1347/1802870526.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.computeDynGroup(DynGroupUpdaterService.scala:97) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.$anonfun$update$2(DynGroupUpdaterService.scala:132) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl$$Lambda$1345/254443580.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at net.liftweb.common.Box$WithFilter.flatMap(Box.scala:484) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.update(DynGroupUpdaterService.scala:131) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2.$anonfun$applyOrElse$14(UpdateDynamicGroups.scala:257) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2.$anonfun$applyOrElse$14$adapted(UpdateDynamicGroups.scala:255) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2$$Lambda$1323/1191968593.apply(Unknown Source) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.TraversableLike$$Lambda$42/1594199808.apply(Unknown Source) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2.applyOrElse(UpdateDynamicGroups.scala:255) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2.applyOrElse(UpdateDynamicGroups.scala:246) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at net.liftweb.actor.SpecializedLiftActor.execTranslate(LiftActor.scala:338) at net.liftweb.actor.SpecializedLiftActor.execTranslate$(LiftActor.scala:338) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$.execTranslate(UpdateDynamicGroups.scala:244) at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:304) at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor$$Lambda$1206/566531775.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:240) at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:239) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$.around(UpdateDynamicGroups.scala:244) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189) at net.liftweb.actor.SpecializedLiftActor$$Lambda$1205/307036850.apply$mcV$sp(Unknown Source) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Or like:
pool-4-thread-14 priority:5 - threadId:0x00007f2c6c11f800 - nativeId:0x158e - nativeId (decimal):5518 - state:WAITING stackTrace: java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at zio.internal.OneShot.get(OneShot.scala:79) - locked <0x00000000c4b735d0> (a zio.internal.OneShot) at zio.Runtime.unsafeRunSync(Runtime.scala:70) at zio.Runtime.unsafeRunSync$(Runtime.scala:65) at com.normation.zio$ZioRuntime$$anon$2.unsafeRunSync(ZioCommons.scala:313) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:327) at com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:352) at com.normation.rudder.hooks.RunHooks$.syncRun(RunHooks.scala:292) at com.normation.rudder.services.policies.PromiseGeneration_Hooks.$anonfun$runPreHooks$1(DeploymentService.scala:1491) at com.normation.rudder.services.policies.PromiseGeneration_Hooks$$Lambda$2707/704991484.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGeneration_Hooks.runPreHooks(DeploymentService.scala:1490) at com.normation.rudder.services.policies.PromiseGeneration_Hooks.runPreHooks$(DeploymentService.scala:1487) at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.runPreHooks(DeploymentService.scala:628) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$14(DeploymentService.scala:218) at com.normation.rudder.services.policies.PromiseGenerationService$$Lambda$2702/484666814.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.policies.PromiseGenerationService.deploy(DeploymentService.scala:209) at com.normation.rudder.services.policies.PromiseGenerationService.deploy$(DeploymentService.scala:145) at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.deploy(DeploymentService.scala:628) at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$$anonfun$messageHandler$1.applyOrElse(AsyncDeploymentActor.scala:381) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:503) at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:495) at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.execTranslate(AsyncDeploymentActor.scala:371) at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:304) at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor$$Lambda$1230/2075478193.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:240) at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:239) at com.normation.rudder.batch.AsyncDeploymentActor$DeployerAgent$.around(AsyncDeploymentActor.scala:371) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189) at net.liftweb.actor.SpecializedLiftActor$$Lambda$1229/1135936028.apply$mcV$sp(Unknown Source) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Updated by François ARMAND almost 5 years ago
- Severity set to Critical - prevents main use of Rudder | no workaround | data loss | security
- User visibility set to Getting started - demo | first install | Technique editor and level 1 Techniques
- Priority changed from 0 to 94
I'm setting it to critical even if having 2 cpu solves the problem, because people are likely to test in a maching with just one CPU.
Updated by François ARMAND almost 5 years ago
The problem may be that we should never have a runSync run from somewhere else than the zio threadpool. ==> look at how we could do that?
Updated by François ARMAND almost 5 years ago
OK, so the problem is likelly that with still have some synchronize
, that LAScheduler
use synchronized, and that is seems that openjdk 8 has some problem to deals with the way they interact (but not openjdk 11) => remove our synchronize
Updated by François ARMAND almost 5 years ago
Lazy val
are also synchronized
in disguise
Updated by François ARMAND almost 5 years ago
- Status changed from New to In progress
Updated by François ARMAND almost 5 years ago
Work in progess here: https://github.com/fanf/rudder/commit/8eafc429c4b010a000080632409d30bcb6f9e8ce
Updated by François ARMAND almost 5 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/2643
Updated by François ARMAND almost 5 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|3ad20e4516c8d79a7eeeed93eff210a51281683e.
Updated by François ARMAND almost 5 years ago
- Has duplicate Bug #16290: Rudder-jetty can not start on debian9 added
Updated by François ARMAND almost 5 years ago
- Has duplicate Bug #16126: Webapp on 6.0 on Ubutun16 doesn't start/is stuck added
Updated by Alexis Mousset almost 5 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 6.0.0 which was released today.