Project

General

Profile

Actions

Bug #16557

closed

CachedFindRuleNodeStatusReports is a huge source of contention

Added by François ARMAND almost 5 years ago. Updated almost 5 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
Checked
Regression:

Description

As soon as we have a loaded rudder, with regular policy generation and runs incoming and user clicking in the UI, the compliance cache because a HUGE source of contention. This is because we have a lock on both read and write.

There is no reason to do so, because we don't care to show slighly outdated compliance for nodes: we have timestamp on it and are able to say it to the user. But we do really do care to display something, and to not block the whole app because of that.

So we can have a logic similar to:

- we have an unlocked cache, both in write and read
- all writes pass through a queue of invalidation (a set of nodes to update). It async compute compliance for updated nodes and update the cache values. I'm not even sure we need a lock here.
- all read are free from locking.

Typical contention:

"zio-rudder-mix-10" - Thread t@71
   java.lang.Thread.State: BLOCKED
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.$anonfun$invalidate$1(ReportingServiceImpl.scala:215)
        - waiting to lock <2cf07dc8> (a com.normation.rudder.services.reports.CachedReportingServiceImpl) owned by "pool-5-thread-12" t@483
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports$$Lambda$2089/243809471.apply(Unknown Source)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57)
        at scala.concurrent.package$.blocking(package.scala:146)
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.invalidate(ReportingServiceImpl.scala:214)
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.invalidate$(ReportingServiceImpl.scala:214)
        at com.normation.rudder.services.reports.CachedReportingServiceImpl.invalidate(ReportingServiceImpl.scala:94)
        at com.normation.rudder.reports.execution.ReportsExecutionService.$anonfun$new$2(ReportsExecutionService.scala:87)
        at com.normation.rudder.reports.execution.ReportsExecutionService$$Lambda$2086/719053554.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at zio.internal.FiberContext.evaluateNow(FiberContext.scala:404)
        at zio.internal.FiberContext.$anonfun$evaluateLater$1(FiberContext.scala:602)
        at zio.internal.FiberContext$$Lambda$230/219812012.run(Unknown Source)
        at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"scala-execution-context-global-597" - Thread t@597
   java.lang.Thread.State: BLOCKED
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.$anonfun$checkAndUpdateCache$1(ReportingServiceImpl.scala:229)
        - waiting to lock <2cf07dc8> (a com.normation.rudder.services.reports.CachedReportingServiceImpl) owned by "pool-5-thread-12" t@483
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports$$Lambda$2096/1223677606.apply(Unknown Source)
        at scala.concurrent.impl.ExecutionContextImpl$DefaultThreadFactory$$anon$1$$anon$2.block(ExecutionContextImpl.scala:75)
        at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
        at scala.concurrent.impl.ExecutionContextImpl$DefaultThreadFactory$$anon$1.blockOn(ExecutionContextImpl.scala:87)
        at scala.concurrent.package$.blocking(package.scala:146)
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.checkAndUpdateCache(ReportingServiceImpl.scala:228)
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.findRuleNodeStatusReports(ReportingServiceImpl.scala:283)
        at com.normation.rudder.services.reports.CachedFindRuleNodeStatusReports.findRuleNodeStatusReports$(ReportingServiceImpl.scala:280)
        at com.normation.rudder.services.reports.CachedReportingServiceImpl.findRuleNodeStatusReports(ReportingServiceImpl.scala:94)
        at com.normation.rudder.web.services.AsyncComplianceService$NodeCompliance.computeCompliance(AsyncComplianceService.scala:122)
        at com.normation.rudder.web.services.AsyncComplianceService$ComplianceBy.$anonfun$futureCompliance$1(AsyncComplianceService.scala:100)
        at com.normation.rudder.web.services.AsyncComplianceService$ComplianceBy$$Lambda$7099/308232004.apply(Unknown Source)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
        at scala.concurrent.Future$$$Lambda$1629/561133045.apply(Unknown Source)
        at scala.util.Success.$anonfun$map$1(Try.scala:255)
        at scala.util.Success.map(Try.scala:213)
        at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
        at scala.concurrent.Future$$Lambda$1631/416579056.apply(Unknown Source)
        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
        at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
        at scala.concurrent.impl.Promise$$Lambda$1635/643434827.apply(Unknown Source)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)


Subtasks 2 (0 open2 closed)

Bug #16565: New cache doesn't return all complianceReleasedNicolas CHARLESActions
Bug #16612: New compliance cache must return expired dataReleasedNicolas CHARLESActions

Related issues 2 (0 open2 closed)

Related to Rudder - Bug #16382: Improve performance of policy generation writerReleasedNicolas CHARLESActions
Related to Rudder - Bug #17341: Compliance data for reporting plugin are not generated anymoreReleasedFrançois ARMANDActions
Actions #1

Updated by François ARMAND almost 5 years ago

  • Status changed from New to In progress
Actions #2

Updated by François ARMAND almost 5 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder/pull/2717
Actions #3

Updated by François ARMAND almost 5 years ago

  • Status changed from Pending technical review to Pending release
Actions #4

Updated by François ARMAND almost 5 years ago

  • Related to Bug #16382: Improve performance of policy generation writer added
Actions #5

Updated by François ARMAND almost 5 years ago

  • Fix check changed from To do to Checked
Actions #6

Updated by Vincent MEMBRÉ almost 5 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 6.0.3 which was released today.

Actions #7

Updated by Vincent MEMBRÉ over 4 years ago

  • Related to Bug #17341: Compliance data for reporting plugin are not generated anymore added
Actions

Also available in: Atom PDF