Project

General

Profile

Actions

Architecture #24968

open

Upgrade to ZIO 2.1.2

Added by François ARMAND about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
N/A
Category:
Architecture - Dependencies
Effort required:
Name check:
To do
Fix check:
To do
Regression:
No

Description

There's a ton of preformance enhancement, so we should try them.

There is two things to change in the set-up from release notes:
- since they removed the auto-IO-blocking by default, we need to enable it back: https://github.com/zio/zio/releases/tag/v2.1.0 Runtime.enableAutoBlockingExecutor
- we should try "come back to CPU as soon as the effect is done": https://github.com/zio/zio/releases/tag/v2.1.2 EagerShiftBack

Actions #1

Updated by François ARMAND about 1 month ago

  • Status changed from New to In progress
  • Assignee set to François ARMAND
Actions #2

Updated by François ARMAND about 1 month ago · Edited

So, it's not trivial.

I tried to update to 2.1.2.

In the case where Runtime.enableAutoBlockingExecutor is not set, then on a machine with one core, we quickly reach a dead-lock - as expected.

In the case where I try to add the aspect like that:

 object IOResult {
    def attempt[A](error: String)(effect: => A):                IO[SystemError, A] = {
      // In ZIO 2 blocking is automagically managed - https://github.com/zio/zio/issues/1275
      // in 2.1.0, they remove the auto-blocking. Our code is not ready for that, so we need to set it back.
      ZIO.attempt(effect).mapError(ex => SystemError(error, ex)).provideLayer(Runtime.enableAutoBlockingExecutor)
    }
...

Then, after a bit of run time, I get an exception:

2024-06-05 10:45:16+0200 INFO  bootchecks.migration.techniques - Migrating technique 'test_ABR' to Rudder 8.0 yaml format
2024-06-05 10:45:16+0200 WARN  bootchecks.migration.techniques - An error occurred when migrating technique metadata '/var/rudder/configuration-repository-8.2/techniques/ncf_techniques/test_ABR/1.0/technique.json'. Directives based on that technique may not work anymore and policy generation fail.Some files among technique.json, metadata.xml, technique.cf, technique.ps1 may have been altered. You can revert to pre-migration state using git. The error was: Inconsistency: Error when trying to read a technique with Rudder 7.3 JSON metadata descriptor: .id(missing)
[18,304s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.
[18,305s][warning][os,thread] Failed to start the native thread for java.lang.Thread "ZScheduler-Worker-14" 
[18,308s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached.
[18,308s][warning][os,thread] Failed to start the native thread for java.lang.Thread "rebel-finalize" 
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
    at java.base/java.lang.Thread.start0(Thread.java)
    at java.base/java.lang.Thread.start(Thread.java:1526)
    at zio.internal.ZScheduler.$anonfun$new$2(ZScheduler.scala:51)
    at zio.internal.ZScheduler.$anonfun$new$2$adapted(ZScheduler.scala:51)
    at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1323)
    at zio.internal.ZScheduler.<init>(ZScheduler.scala:51)
    at zio.RuntimePlatformSpecific.$anonfun$enableAutoBlockingExecutor$1(RuntimePlatformSpecific.scala:59)
    at zio.ZLayer.$anonfun$scope$17(ZLayer.scala:431)
    at zio.ZLayer.$anonfun$build$4(ZLayer.scala:129)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1032)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403)
    at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343)
    at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162)
    at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134)
    at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447)
    at zio.Unsafe$.unsafe(Unsafe.scala:37)
    at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447)
    at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430)
    at com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:456)
    at com.normation.cfclerk.services.impl.GitTechniqueReader.getModifiedTechniques(GitTechniqueReader.scala:327)
    at com.normation.cfclerk.services.impl.TechniqueRepositoryImpl.update(TechniqueRepositoryImpl.scala:107)
    at bootstrap.liftweb.checks.migration.MigrateJsonTechniquesToYaml.$anonfun$updateNcfTechniques$12(MigrateJsonTechniquesToYaml.scala:138)
    at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:99)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:960)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403)
    at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343)
    at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162)
    at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134)
    at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447)
    at zio.Unsafe$.unsafe(Unsafe.scala:37)
    at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447)
    at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430)
    at com.normation.zio$ZioRuntime$.runNowLogError(ZioCommons.scala:437)
    at bootstrap.liftweb.checks.migration.MigrateJsonTechniquesToYaml.checks(MigrateJsonTechniquesToYaml.scala:152)
    at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1(BootstrapChecks.scala:101)
    at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1$adapted(BootstrapChecks.scala:92)
    at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
    at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
    at bootstrap.liftweb.SequentialImmediateBootStrapChecks.checks(BootstrapChecks.scala:92)
    at bootstrap.liftweb.RudderConfig$.$anonfun$init$1(RudderConfig.scala:1299)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at zio.ZIOCompanionVersionSpecific.$anonfun$attempt$1(ZIOCompanionVersionSpecific.scala:99)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:960)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1025)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1120)
    at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1053)
    at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:403)
    at zio.internal.FiberRuntime.start(FiberRuntime.scala:1343)
    at zio.Runtime$UnsafeAPIV1.runOrFork(Runtime.scala:162)
    at zio.Runtime$UnsafeAPIV1.run(Runtime.scala:134)
    at com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:447)
    at zio.Unsafe$.unsafe(Unsafe.scala:37)
    at com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:447)
    at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:430)
    at com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:456)
    at bootstrap.liftweb.LiftInitContextListener.contextInitialized(LiftInitContextListener.scala:135)
    at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:1049)
    at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:624)
    at org.eclipse.jetty.server.handler.ContextHandler.contextInitialized(ContextHandler.java:984)
    at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:740)
    at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
    at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1304)
    at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:901)
    at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
    at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:532)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:121)
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:121)
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:171)
    at org.eclipse.jetty.server.Server.start(Server.java:470)
    at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
    at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:89)
    at org.eclipse.jetty.server.Server.doStart(Server.java:415)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
    at org.eclipse.jetty.runner.Runner.run(Runner.java:531)
    at org.eclipse.jetty.runner.Runner.main(Runner.java:599)

Process finished with exit code 255

I didn't try to analysis the problem, but given the context, it's likely that we fire just too many I/O threads, reaching a limit.

So, it will need to be analysed and changed.

It looks like the only solution will be to actually mark what is blocking blocking, which is a complicated and non trivial subject and will need to wait for 8.3.

If it matters, I tried on several JVM, the most standard being OpenJDK jdk-21.0.1.

Actions #4

Updated by Alexis Mousset about 1 month ago

  • Tracker changed from Bug to Architecture
  • Status changed from In progress to New
  • Target version changed from 8.2.0~alpha1 to Ideas (not version specific)
  • Priority deleted (0)
Actions

Also available in: Atom PDF