Architecture #24968
openUpgrade to ZIO 2.1.12
Description
There's a ton of preformance enhancement, so we should try them.
There is two things to change in the set-up from release notes:
- since they removed the auto-IO-blocking by default, we need to enable it back: https://github.com/zio/zio/releases/tag/v2.1.0 Runtime.enableAutoBlockingExecutor
- we should try "come back to CPU as soon as the effect is done": https://github.com/zio/zio/releases/tag/v2.1.2 EagerShiftBack
Updated by François ARMAND 6 months ago
- Status changed from New to In progress
- Assignee set to François ARMAND
Updated by François ARMAND 6 months ago · Edited
So, it's not trivial.
I tried to update to 2.1.2.
In the case where Runtime.enableAutoBlockingExecutor
is not set, then on a machine with one core, we quickly reach a dead-lock - as expected.
In the case where I try to add the aspect like that:
object IOResult { def attempt[A](error: String)(effect: => A): IO[SystemError, A] = { // In ZIO 2 blocking is automagically managed - https://github.com/zio/zio/issues/1275 // in 2.1.0, they remove the auto-blocking. Our code is not ready for that, so we need to set it back. ZIO.attempt(effect).mapError(ex => SystemError(error, ex)).provideLayer(Runtime.enableAutoBlockingExecutor) } ...
Then, after a bit of run time, I get an exception:
2024-06-05 10:45:16+0200 INFO bootchecks.migration.techniques - Migrating technique 'test_ABR' to Rudder 8.0 yaml format 2024-06-05 10:45:16+0200 WARN bootchecks.migration.techniques - An error occurred when migrating technique metadata '/var/rudder/configuration-repository-8.2/techniques/ncf_techniques/test_ABR/1.0/technique.json'. Directives based on that technique may not work anymore and policy generation fail.Some files among technique.json, metadata.xml, technique.cf, technique.ps1 may have been altered. You can revert to pre-migration state using git. The error was: Inconsistency: Error when trying to read a technique with Rudder 7.3 JSON metadata descriptor: .id(missing) [18,304s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached. [18,305s][warning][os,thread] Failed to start the native thread for java.lang.Thread "ZScheduler-Worker-14" [18,308s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached. [18,308s][warning][os,thread] Failed to start the native thread for java.lang.Thread "rebel-finalize" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.base/java.lang.Thread.start0(Thread.java) at java.base/java.lang.Thread.start(Thread.java:1526) at zio.internal.ZScheduler.$anonfun$new$2(ZScheduler.scala:51) at zio.internal.ZScheduler.$anonfun$new$2$adapted(ZScheduler.scala:51) at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323) at zio.internal.ZScheduler.<init>(ZScheduler.scala:51) at zio.RuntimePlatformSpecific.$anonfun$enableAutoBlockingExecutor$1(RuntimePlatformSpecific.scala:59) at zio.ZLayer.$anonfun$scope$17(ZLayer.scala:431) at zio.ZLayer.$anonfun$build$4(ZLayer.scala:129) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1032) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403) at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343) at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162) at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134) at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447) at zio.Unsafe$.unsafe(Unsafe.scala:37) at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430) at com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:456) at com.normation.cfclerk.services.impl.GitTechniqueReader.getModifiedTechniques(GitTechniqueReader.scala:327) at com.normation.cfclerk.services.impl.TechniqueRepositoryImpl.update(TechniqueRepositoryImpl.scala:107) at bootstrap.liftweb.checks.migration.MigrateJsonTechniquesToYaml.$anonfun$updateNcfTechniques$12(MigrateJsonTechniquesToYaml.scala:138) at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:99) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:960) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403) at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343) at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162) at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134) at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447) at zio.Unsafe$.unsafe(Unsafe.scala:37) at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430) at com.normation.zio$ZioRuntime$.runNowLogError(ZioCommons.scala:437) at bootstrap.liftweb.checks.migration.MigrateJsonTechniquesToYaml.checks(MigrateJsonTechniquesToYaml.scala:152) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1(BootstrapChecks.scala:101) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1$adapted(BootstrapChecks.scala:92) at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576) at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574) at scala.collection.AbstractIterable.foreach(Iterable.scala:933) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.checks(BootstrapChecks.scala:92) at bootstrap.liftweb.RudderConfig$.$anonfun$init$1(RudderConfig.scala:1299) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:99) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:960) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120) at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053) at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403) at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343) at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162) at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134) at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447) at zio.Unsafe$.unsafe(Unsafe.scala:37) at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430) at com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:456) at bootstrap.liftweb.LiftInitContextListener.contextInitialized(LiftInitContextListener.scala:135) at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:1049) at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:624) at org.eclipse.jetty.server.handler.ContextHandler.contextInitialized(ContextHandler.java:984) at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:740) at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1304) at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:901) at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:532) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:121) at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:121) at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171) at org.eclipse.jetty.server.Server.start(Server.java:470) at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114) at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89) at org.eclipse.jetty.server.Server.doStart(Server.java:415) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.eclipse.jetty.runner.Runner.run(Runner.java:531) at org.eclipse.jetty.runner.Runner.main(Runner.java:599) Process finished with exit code 255
I didn't try to analysis the problem, but given the context, it's likely that we fire just too many I/O threads, reaching a limit.
So, it will need to be analysed and changed.
It looks like the only solution will be to actually mark what is blocking blocking, which is a complicated and non trivial subject and will need to wait for 8.3.
If it matters, I tried on several JVM, the most standard being OpenJDK jdk-21.0.1.
Updated by François ARMAND 6 months ago
Information here: https://discord.com/channels/629491597070827530/1247991421638807614
Updated by Alexis Mousset 6 months ago
- Tracker changed from Bug to Architecture
- Status changed from In progress to New
- Target version changed from 8.2.0~alpha1 to Ideas (not version specific)
- Priority deleted (
0)
Updated by François ARMAND 4 months ago
- Related to Architecture #25186: Update Scala dependencies added
Updated by François ARMAND 4 months ago
With ZIO 2.1.6, and with configuration:
object ZioRuntime { ... def unsafeRun[E, A](zio: => ZIO[Any, E, A]): A = { // unsafeRun will display a formatted fiber trace in case there is an error, which likely what we wants: // here, error were not prevented before run, so it's a defect that should be corrected. Unsafe.unsafe(implicit unsafe => internal.unsafe.run(zio.provideLayer(Runtime.enableAutoBlockingExecutor)).getOrThrowFiberFailure() ) } ... }
It almost works but the test TestBuildNodeConfiguration
throws a reproducible exception:
[25.642s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached. [25.643s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Thread-8742"*
Updated by François ARMAND 4 months ago
Perhaps it would be simpler to avoid the exector parameter and have a pair of attempt/attempBlocking and use them correctly in all rudder. That's a log of methods to anotate, though.
Updated by Clark ANDRIANASOLO about 1 month ago
- Target version changed from Ideas (not version specific) to 8.3.0~alpha1
Updated by Clark ANDRIANASOLO about 1 month ago
- Status changed from New to In progress
- Assignee changed from François ARMAND to Clark ANDRIANASOLO
Updated by Clark ANDRIANASOLO about 1 month ago
- Status changed from In progress to Pending technical review
- Assignee changed from Clark ANDRIANASOLO to François ARMAND
- Pull Request set to https://github.com/Normation/rudder/pull/5989
Updated by François ARMAND 27 days ago
- Assignee changed from François ARMAND to Clark ANDRIANASOLO
- Pull Request changed from https://github.com/Normation/rudder/pull/5989 to https://github.com/Normation/rudder/pull/6005
Updated by François ARMAND 27 days ago
- Subject changed from Upgrade to ZIO 2.1.2 to Upgrade to ZIO 2.1.12
Updated by Anonymous 22 days ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|a82e6081c03a5ef432a7dbc906cf3a8a0f1c97ee.