Bug #16557
closedCachedFindRuleNodeStatusReports is a huge source of contention
Description
As soon as we have a loaded rudder, with regular policy generation and runs incoming and user clicking in the UI, the compliance cache because a HUGE source of contention. This is because we have a lock on both read and write.
There is no reason to do so, because we don't care to show slighly outdated compliance for nodes: we have timestamp on it and are able to say it to the user. But we do really do care to display something, and to not block the whole app because of that.
So we can have a logic similar to:
- we have an unlocked cache, both in write and read
- all writes pass through a queue of invalidation (a set of nodes to update). It async compute compliance for updated nodes and update the cache values. I'm not even sure we need a lock here.
- all read are free from locking.
Typical contention:
"zio-rudder-mix-10" - Thread t@71 java.lang.Thread.State: BLOCKED at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.$anonfun$invalidate$1(ReportingServiceImpl.scala:215) - waiting to lock <2cf07dc8> (a com.normation.rudder.services.reports.CachedReportingServiceImpl) owned by "pool-5-thread-12" t@483 at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports$$Lambda$2089/243809471.apply(Unknown Source) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57) at scala.concurrent.package$.blocking(package.scala:146) at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.invalidate(ReportingServiceImpl.scala:214) at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.invalidate$(ReportingServiceImpl.scala:214) at com.normation.rudder.services.reports.CachedReportingServiceImpl.invalidate(ReportingServiceImpl.scala:94) at com.normation.rudder.reports.execution.ReportsExecutionService.$anonfun$new$2(ReportsExecutionService.scala:87) at com.normation.rudder.reports.execution.ReportsExecutionService$$Lambda$2086/719053554.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at zio.internal.FiberContext.evaluateNow(FiberContext.scala:404) at zio.internal.FiberContext.$anonfun$evaluateLater$1(FiberContext.scala:602) at zio.internal.FiberContext$$Lambda$230/219812012.run(Unknown Source) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) "scala-execution-context-global-597" - Thread t@597 java.lang.Thread.State: BLOCKED at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.$anonfun$checkAndUpdateCache$1(ReportingServiceImpl.scala:229) - waiting to lock <2cf07dc8> (a com.normation.rudder.services.reports.CachedReportingServiceImpl) owned by "pool-5-thread-12" t@483 at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports$$Lambda$2096/1223677606.apply(Unknown Source) at scala.concurrent.impl.ExecutionContextImpl$DefaultThreadFactory$$anon$1$$anon$2.block(ExecutionContextImpl.scala:75) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313) at scala.concurrent.impl.ExecutionContextImpl$DefaultThreadFactory$$anon$1.blockOn(ExecutionContextImpl.scala:87) at scala.concurrent.package$.blocking(package.scala:146) at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.checkAndUpdateCache(ReportingServiceImpl.scala:228) at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.findRuleNodeStatusReports(ReportingServiceImpl.scala:283) at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.findRuleNodeStatusReports$(ReportingServiceImpl.scala:280) at com.normation.rudder.services.reports.CachedReportingServiceImpl.findRuleNodeStatusReports(ReportingServiceImpl.scala:94) at com.normation.rudder.web.services.AsyncComplianceService$NodeCompliance.computeCompliance(AsyncComplianceService.scala:122) at com.normation.rudder.web.services.AsyncComplianceService$ComplianceBy.$anonfun$futureCompliance$1(AsyncComplianceService.scala:100) at com.normation.rudder.web.services.AsyncComplianceService$ComplianceBy$$Lambda$7099/308232004.apply(Unknown Source) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.concurrent.Future$$$Lambda$1629/561133045.apply(Unknown Source) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.Future$$Lambda$1631/416579056.apply(Unknown Source) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.Promise$$Lambda$1635/643434827.apply(Unknown Source) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)