Project

General

Profile

Actions

Architecture #24968

open

Upgrade to ZIO 2.1.12

Added by François ARMAND 6 months ago. Updated 11 days ago.

Status:
Pending release
Priority:
N/A
Category:
Architecture - Dependencies
Target version:
Effort required:
Name check:
To do
Fix check:
To do
Regression:
No

Description

There's a ton of preformance enhancement, so we should try them.

There is two things to change in the set-up from release notes:
- since they removed the auto-IO-blocking by default, we need to enable it back: https://github.com/zio/zio/releases/tag/v2.1.0 Runtime.enableAutoBlockingExecutor
- we should try "come back to CPU as soon as the effect is done": https://github.com/zio/zio/releases/tag/v2.1.2 EagerShiftBack


Related issues 1 (1 open0 closed)

Related to Rudder - Architecture #25186: Update Scala dependenciesPending releaseFrançois ARMANDActions
Actions #1

Updated by François ARMAND 6 months ago

  • Status changed from New to In progress
  • Assignee set to François ARMAND
Actions #2

Updated by François ARMAND 6 months ago · Edited

So, it's not trivial.

I tried to update to 2.1.2.

In the case where Runtime.enableAutoBlockingExecutor is not set, then on a machine with one core, we quickly reach a dead-lock - as expected.

In the case where I try to add the aspect like that:

 object IOResult {
    def attempt[A](error: String)(effect: => A):                IO[SystemError, A] = {
      // In ZIO 2 blocking is automagically managed - https://github.com/zio/zio/issues/1275
      // in 2.1.0, they remove the auto-blocking. Our code is not ready for that, so we need to set it back.
      ZIO.attempt(effect).mapError(ex => SystemError(error, ex)).provideLayer(Runtime.enableAutoBlockingExecutor)
    }
...

Then, after a bit of run time, I get an exception:

2024-06-05 10:45:16+0200 INFO  bootchecks.migration.techniques - Migrating technique 'test_ABR' to Rudder 8.0 yaml format
2024-06-05 10:45:16+0200 WARN  bootchecks.migration.techniques - An error occurred when migrating technique metadata '/var/rudder/configuration-repository-8.2/techniques/ncf_techniques/test_ABR/1.0/technique.json'. Directives based on that technique may not work anymore and policy generation fail.Some files among technique.json, metadata.xml, technique.cf, technique.ps1 may have been altered. You can revert to pre-migration state using git. The error was: Inconsistency: Error when trying to read a technique with Rudder 7.3 JSON metadata descriptor: .id(missing)
[18,304s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.
[18,305s][warning][os,thread] Failed to start the native thread for java.lang.Thread "ZScheduler-Worker-14" 
[18,308s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.
[18,308s][warning][os,thread] Failed to start the native thread for java.lang.Thread "rebel-finalize" 
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
    at java.base/java.lang.Thread.start0(Thread.java)
    at java.base/java.lang.Thread.start(Thread.java:1526)
    at zio.internal.ZScheduler.$anonfun$new$2(ZScheduler.scala:51)
    at zio.internal.ZScheduler.$anonfun$new$2$adapted(ZScheduler.scala:51)
    at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
    at zio.internal.ZScheduler.<init>(ZScheduler.scala:51)
    at zio.RuntimePlatformSpecific.$anonfun$enableAutoBlockingExecutor$1(RuntimePlatformSpecific.scala:59)
    at zio.ZLayer.$anonfun$scope$17(ZLayer.scala:431)
    at zio.ZLayer.$anonfun$build$4(ZLayer.scala:129)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1032)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403)
    at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343)
    at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162)
    at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134)
    at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447)
    at zio.Unsafe$.unsafe(Unsafe.scala:37)
    at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447)
    at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430)
    at com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:456)
    at com.normation.cfclerk.services.impl.GitTechniqueReader.getModifiedTechniques(GitTechniqueReader.scala:327)
    at com.normation.cfclerk.services.impl.TechniqueRepositoryImpl.update(TechniqueRepositoryImpl.scala:107)
    at bootstrap.liftweb.checks.migration.MigrateJsonTechniquesToYaml.$anonfun$updateNcfTechniques$12(MigrateJsonTechniquesToYaml.scala:138)
    at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:99)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:960)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403)
    at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343)
    at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162)
    at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134)
    at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447)
    at zio.Unsafe$.unsafe(Unsafe.scala:37)
    at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447)
    at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430)
    at com.normation.zio$ZioRuntime$.runNowLogError(ZioCommons.scala:437)
    at bootstrap.liftweb.checks.migration.MigrateJsonTechniquesToYaml.checks(MigrateJsonTechniquesToYaml.scala:152)
    at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1(BootstrapChecks.scala:101)
    at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1$adapted(BootstrapChecks.scala:92)
    at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
    at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
    at bootstrap.liftweb.SequentialImmediateBootStrapChecks.checks(BootstrapChecks.scala:92)
    at bootstrap.liftweb.RudderConfig$.$anonfun$init$1(RudderConfig.scala:1299)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:99)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:960)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403)
    at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343)
    at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162)
    at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134)
    at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447)
    at zio.Unsafe$.unsafe(Unsafe.scala:37)
    at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447)
    at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430)
    at com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:456)
    at bootstrap.liftweb.LiftInitContextListener.contextInitialized(LiftInitContextListener.scala:135)
    at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:1049)
    at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:624)
    at org.eclipse.jetty.server.handler.ContextHandler.contextInitialized(ContextHandler.java:984)
    at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:740)
    at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
    at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1304)
    at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:901)
    at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
    at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:532)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:121)
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:121)
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171)
    at org.eclipse.jetty.server.Server.start(Server.java:470)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89)
    at org.eclipse.jetty.server.Server.doStart(Server.java:415)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
    at org.eclipse.jetty.runner.Runner.run(Runner.java:531)
    at org.eclipse.jetty.runner.Runner.main(Runner.java:599)

Process finished with exit code 255

I didn't try to analysis the problem, but given the context, it's likely that we fire just too many I/O threads, reaching a limit.

So, it will need to be analysed and changed.

It looks like the only solution will be to actually mark what is blocking blocking, which is a complicated and non trivial subject and will need to wait for 8.3.

If it matters, I tried on several JVM, the most standard being OpenJDK jdk-21.0.1.

Actions #4

Updated by Alexis Mousset 6 months ago

  • Tracker changed from Bug to Architecture
  • Status changed from In progress to New
  • Target version changed from 8.2.0~alpha1 to Ideas (not version specific)
  • Priority deleted (0)
Actions #5

Updated by François ARMAND 4 months ago

Actions #6

Updated by François ARMAND 4 months ago

With ZIO 2.1.6, and with configuration:

object ZioRuntime {
...

    def unsafeRun[E, A](zio: => ZIO[Any, E, A]): A = {
      // unsafeRun will display a formatted fiber trace in case there is an error, which likely what we wants:
      // here, error were not prevented before run, so it's a defect that should be corrected.
      Unsafe.unsafe(implicit unsafe =>
        internal.unsafe.run(zio.provideLayer(Runtime.enableAutoBlockingExecutor)).getOrThrowFiberFailure()
      )
    }
...
}

It almost works but the test TestBuildNodeConfiguration throws a reproducible exception:

[25.642s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[25.643s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Thread-8742"*
Actions #7

Updated by François ARMAND 4 months ago

Perhaps it would be simpler to avoid the exector parameter and have a pair of attempt/attempBlocking and use them correctly in all rudder. That's a log of methods to anotate, though.

Actions #8

Updated by Clark ANDRIANASOLO 21 days ago

  • Target version changed from Ideas (not version specific) to 8.3.0~alpha1
Actions #9

Updated by Clark ANDRIANASOLO 21 days ago

  • Status changed from New to In progress
  • Assignee changed from François ARMAND to Clark ANDRIANASOLO
Actions #10

Updated by Clark ANDRIANASOLO 21 days ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Clark ANDRIANASOLO to François ARMAND
  • Pull Request set to https://github.com/Normation/rudder/pull/5989
Actions #11

Updated by François ARMAND 16 days ago

  • Assignee changed from François ARMAND to Clark ANDRIANASOLO
  • Pull Request changed from https://github.com/Normation/rudder/pull/5989 to https://github.com/Normation/rudder/pull/6005
Actions #12

Updated by François ARMAND 16 days ago

  • Subject changed from Upgrade to ZIO 2.1.2 to Upgrade to ZIO 2.1.12
Actions #13

Updated by Anonymous 11 days ago

  • Status changed from Pending technical review to Pending release
Actions

Also available in: Atom PDF