Bug #15982
closedWebapp does not start when only one proc is available (deadlock)
Description
With the current 6.0.0-beta, the webapp doesn't start if there is only one proc available for the JVM.
This is due to a thread deadlock in the main thread:
java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(java.base@11.0.4/Native Method) - waiting on <0x00000000eba139e8> (a zio.internal.OneShot) at java.lang.Object.wait(java.base@11.0.4/Object.java:328) at zio.internal.OneShot.get(OneShot.scala:79) - waiting to re-lock in wait() <0x00000000eba139e8> (a zio.internal.OneShot) at zio.Runtime.unsafeRunSync(Runtime.scala:70) at zio.Runtime.unsafeRunSync$(Runtime.scala:65) at com.normation.zio$ZioRuntime$$anon$2.unsafeRunSync(ZioCommons.scala:312) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:336) at com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:361) at bootstrap.liftweb.Boot.initPlugins(Boot.scala:531) at bootstrap.liftweb.Boot.boot(Boot.scala:248) ...
Several other thread are also in an unsafeRunSync
part.
Updated by François ARMAND about 5 years ago
- Status changed from New to In progress
Updated by François ARMAND about 5 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/2535
Updated by François ARMAND about 5 years ago
So, the problem seems to be that cat
and zio
don't collaborate well when cats is not managed through zio.interop
.
To explain the problem: with the current configuration, we only have one thread pool for the whole Future/cats/zio runtime because we told ZIO runtime to also use scala global scheduler and we have a config with only one proc => if the thread block the proc for the threadpool, all other threads are blocked.
If we only have zio
things, everything goes well as long as we mark "blocking" things that can block (fair enough).
But if we also have one cat things, we get a deadlock - I don't understand why. I understand that the cats operation get executed on scala global scheduler, and that it use the thread and so no other zio operation can get executed. But I don't understand why it doesn't finish and we get a dead lock.
You can test by replacing the executionTask
of FindNewReportsExecution
by:
val executeTask: Long => Box[Unit] = (processId:Long) => net.liftweb.common.Full(1) // tryo{reportsExecutionService.findAndSaveExecutions(processId)}
=> boot succeeds
I wasn't able to reproduce the problem on a toy sample: if a cats operation get the thread pool time, other zio op are well stopped (as expected) but the does not forbid the cats op to terminate, and then to execute zio operations.
So, the proposed correction split appart zio threadpool and cats one. So we don't reach the strange case above.
Last thought: perhaps the problem is with the lift `net.liftweb.actor.LAScheduler`. When we have the deadlock, we have both a cats
and a zio
operation:
"pool-6-thread-2" #106 prio=5 os_prio=0 cpu=98.86ms elapsed=14.54s tid=0x00007fda48009800 nid=0x5a7ff in Object.wait() [0x00007fd9ea4f9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(java.base@11.0.4/Native Method) - waiting on <0x00000000ec51b008> (a zio.internal.OneShot) at java.lang.Object.wait(java.base@11.0.4/Object.java:328) at zio.internal.OneShot.get(OneShot.scala:79) - waiting to re-lock in wait() <0x00000000ec51b008> (a zio.internal.OneShot) at zio.Runtime.unsafeRunSync(Runtime.scala:70) at zio.Runtime.unsafeRunSync$(Runtime.scala:65) at com.normation.zio$ZioRuntime$$anon$2.unsafeRunSync(ZioCommons.scala:319) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:343) at com.normation.box$IOToBox.toBox(ZioCommons.scala:392) at com.normation.rudder.services.nodes.NodeInfoServiceCached.getLDAPNodeInfo(NodeInfoService.scala:546) at com.normation.rudder.services.nodes.NodeInfoServiceCached.getLDAPNodeInfo$(NodeInfoService.scala:528) at com.normation.rudder.services.nodes.NodeInfoServiceCachedImpl.getLDAPNodeInfo(NodeInfoService.scala:601) at com.normation.rudder.services.queries.AcceptedNodesLDAPQueryProcessor.$anonfun$queryAndChekNodeId$3(LdapQueryProcessor.scala:179) at com.normation.rudder.services.queries.AcceptedNodesLDAPQueryProcessor$$Lambda$1682/0x0000000100e9b040.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.queries.AcceptedNodesLDAPQueryProcessor.queryAndChekNodeId(LdapQueryProcessor.scala:176) at com.normation.rudder.services.queries.AcceptedNodesLDAPQueryProcessor.processOnlyId(LdapQueryProcessor.scala:231) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.$anonfun$computeDynGroup$4(DynGroupUpdaterService.scala:100) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl$$Lambda$1519/0x0000000100e0d840.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.$anonfun$computeDynGroup$2(DynGroupUpdaterService.scala:99) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl$$Lambda$1517/0x0000000100e0c840.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.computeDynGroup(DynGroupUpdaterService.scala:97) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.$anonfun$update$2(DynGroupUpdaterService.scala:132) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl$$Lambda$1515/0x0000000100e0a040.apply(Unknown Source) at net.liftweb.common.Full.flatMap(Box.scala:856) at net.liftweb.common.Box$WithFilter.flatMap(Box.scala:484) at com.normation.rudder.services.queries.DynGroupUpdaterServiceImpl.update(DynGroupUpdaterService.scala:131) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2.$anonfun$applyOrElse$14(UpdateDynamicGroups.scala:257) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2.$anonfun$applyOrElse$14$adapted(UpdateDynamicGroups.scala:255) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2$$Lambda$1499/0x0000000100df8040.apply(Unknown Source) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.TraversableLike$$Lambda$175/0x000000010053d040.apply(Unknown Source) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2.applyOrElse(UpdateDynamicGroups.scala:255) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$$anonfun$messageHandler$2.applyOrElse(UpdateDynamicGroups.scala:246) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at net.liftweb.actor.SpecializedLiftActor.execTranslate(LiftActor.scala:338) at net.liftweb.actor.SpecializedLiftActor.execTranslate$(LiftActor.scala:338) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$.execTranslate(UpdateDynamicGroups.scala:244) at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:304) at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor$$Lambda$1380/0x0000000100d8f040.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:240) at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:239) at com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager$LAUpdateDyngroup$.around(UpdateDynamicGroups.scala:244) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189) at net.liftweb.actor.SpecializedLiftActor$$Lambda$1379/0x0000000100d8e840.apply$mcV$sp(Unknown Source) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.4/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.4/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.4/Thread.java:834) Locked ownable synchronizers: - <0x00000000c486ade8> (a java.util.concurrent.ThreadPoolExecutor$Worker) "pool-6-thread-10" #206 prio=5 os_prio=0 cpu=1.38ms elapsed=4.79s tid=0x00007fd9e4114000 nid=0x5a863 waiting on condition [0x00007fda21bf1000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.4/Native Method) - parking to wait for <0x00000000ec677758> (a cats.effect.internals.IOPlatform$OneShotLatch) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.4/LockSupport.java:194) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.4/AbstractQueuedSynchronizer.java:885) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.4/AbstractQueuedSynchronizer.java:1039) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.4/AbstractQueuedSynchronizer.java:1345) at cats.effect.internals.IOPlatform$.$anonfun$unsafeResync$2(IOPlatform.scala:51) at cats.effect.internals.IOPlatform$$$Lambda$1634/0x0000000100e75040.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57) at scala.concurrent.package$.blocking(package.scala:146) at cats.effect.internals.IOPlatform$.unsafeResync(IOPlatform.scala:51) at cats.effect.IO.unsafeRunTimed(IO.scala:324) at cats.effect.IO.unsafeRunSync(IO.scala:239) at com.normation.rudder.db.Doobie.transactRunBox(Doobie.scala:100) at com.normation.rudder.repository.jdbc.ReportsJdbcRepository.getReportsfromId(ReportsJdbcRepository.scala:431) at com.normation.rudder.reports.execution.ReportsExecutionService.findAndSaveExecutions(ReportsExecutionService.scala:125) at com.normation.rudder.batch.FindNewReportsExecution.$anonfun$executeTask$2(FetchNodeExecutions.scala:62) at com.normation.rudder.batch.FindNewReportsExecution$$Lambda$1830/0x0000000100f12c40.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at net.liftweb.common.Tryo.tryo(Tryo.scala:22) at net.liftweb.common.Tryo.tryo$(Tryo.scala:20) at net.liftweb.util.Helpers$.tryo(Helpers.scala:34) at net.liftweb.common.Tryo.tryo(Tryo.scala:60) at net.liftweb.common.Tryo.tryo$(Tryo.scala:60) at net.liftweb.util.Helpers$.tryo(Helpers.scala:34) at com.normation.rudder.batch.FindNewReportsExecution.$anonfun$executeTask$1(FetchNodeExecutions.scala:62) at com.normation.rudder.batch.FindNewReportsExecution.$anonfun$executeTask$1$adapted(FetchNodeExecutions.scala:62) at com.normation.rudder.batch.FindNewReportsExecution$$Lambda$1806/0x0000000100efd040.apply(Unknown Source) at com.normation.rudder.batch.AbstractScheduler$StatusManager$TaskProcessor$$anonfun$messageHandler$2.applyOrElse(AbstractScheduler.scala:198) at com.normation.rudder.batch.AbstractScheduler$StatusManager$TaskProcessor$$anonfun$messageHandler$2.applyOrElse(AbstractScheduler.scala:191) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at net.liftweb.actor.SpecializedLiftActor.execTranslate(LiftActor.scala:338) at net.liftweb.actor.SpecializedLiftActor.execTranslate$(LiftActor.scala:338) at com.normation.rudder.batch.AbstractScheduler$StatusManager$TaskProcessor$.execTranslate(AbstractScheduler.scala:189) at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:304) at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor$$Lambda$1380/0x0000000100d8f040.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:240) at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:239) at com.normation.rudder.batch.AbstractScheduler$StatusManager$TaskProcessor$.around(AbstractScheduler.scala:189) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:226) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:189) at net.liftweb.actor.SpecializedLiftActor$$Lambda$1379/0x0000000100d8e840.apply$mcV$sp(Unknown Source) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.4/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.4/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.4/Thread.java:834) Locked ownable synchronizers: - <0x00000000ec677358> (a java.util.concurrent.ThreadPoolExecutor$Worker)
But that doesn't tell more why.
Updated by François ARMAND about 5 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|5b46327392d2629551ffc58bcebf05521519a13e.
Updated by Vincent MEMBRÉ about 5 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 6.0.0~beta1 which was released today.
Updated by François ARMAND about 5 years ago
- Related to Bug #16291: Deadlock when application starts added