Project

General

Profile

Actions

Bug #15775

closed

Bug #15205: Use blocking threadpool for unsafeRun()

Performance problems with ZIO

Added by François ARMAND about 5 years ago. Updated about 5 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
To do
Regression:

Description

We found order of magnitude performance decrease with the new ZIO backend.

The problem seems to come (at least) from:

- slow LDAP connections read. It seems to be linked to ZIO, but not sure why (most likely, cost of interpretation of ZIO programs on that context). That leads to dynamic groups update taking > x10 time in 5.1 than in 5.0 ;
- thousands of short living threads (most of them linked to cats.something). That is extremely expensive.

Some references:
- Blocking as default `effect` import?: https://github.com/zio/zio/issues/1275
- Zio run blocking backwardscompatible code: https://stackoverflow.com/questions/55380840/zio-run-blocking-backwardscompatible-code

Actions #1

Updated by François ARMAND about 5 years ago

  • Status changed from New to In progress
Actions #3

Updated by François ARMAND about 5 years ago

  • Description updated (diff)
Actions #4

Updated by François ARMAND about 5 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder/pull/2486
Actions #5

Updated by François ARMAND about 5 years ago

  • Subject changed from Performance problems with ZIO LDAP connections to Performance problems with ZIO

Some more informations: on a generation, this is the difference on timing between 5.0 and 5.1:

Number of nodes updated         4549   4549     

Run pre-gen scripts hooks       0       358     x --
Run pre-gen modules hooks       0       4       x --
Fetch all information           1437    6156    x 4,28
Historize names                 3282    5720    x 1,74
Build current rule values       839     1515    x 1,81
Build target configuration      140241  956087  x 6,82
- node context                  16455   79897   x 4,86
- other things                  123786  876190  x 7,08
Write node configurations       401538  2307613 x 5,75
Save expected reports           13471   13323   x 0,99
Run post generation hooks       129     267     x 2,07
</pre

	

The first two can be considered as outliners. The problem is not specific to I/O. One of the most I/O intensive task, fecting data, is "only" x4. But one of the less I/O intensive task, other things in build target configuration, is x7!

So we need to understand if the problem is with parallelism, or worse CPU-bounded tasks.

=> add timing information in 5.0 and 5.1 to be able to analyse where the problem comes from.

Actions #6

Updated by Nicolas CHARLES about 5 years ago

Protocol used for new tests:
  • logs in Trace for policy_generation and debig_timing
  • ran the following command, on both 5.0 and 5.1 logs:
    
    sed -i "/is a Dsc node/d"  *
    sed -i "/discarding less priorize/d"  *
    sed -i "/Not updating changes by rule/d"  *
    sed -i "/their promises are going to be written/d"  *
    sed -i "/does not define any components/d"  *
    sed -i "/Not updating non-modified node/d"  *
    
    grep "debug_timing" * >> debug_timing.log
    
    grep "getNodeStatusReports" debug_timing.log >> getNodeStatusReports.log
    sed -i "/getNodeStatusReports/d"  debug_timing.log
    
    grep "computing policy status" debug_timing.log >> computePolicyStatus.log
    sed -i "/computing policy status/d"  debug_timing.log
    
    grep "Compute complianceForRun" debug_timing.log >> complianceForRun.log
    sed -i "/Compute complianceForRun/d"  debug_timing.log
    
    grep "compute compliance reports" debug_timing.log >> complianceReports.log
    sed -i "/compute compliance reports/d"  debug_timing.log
    
    grep "global cost" debug_timing.log >> globalCost.log
    sed -i "/global cost/d"  debug_timing.log
    
    grep "Cache for nodes info expire" debug_timing.log >> cacheExpire.log
    sed -i "/Cache for nodes info expire/d"  debug_timing.log
    
    grep "policy.generation" * >> policy_generation.log
    
    sed -i "/to merge multi-directives/d" policy_generation.log
    sed -i "/directive for unique techniques/d" policy_generation.log
    sed -i "/indep multi-directives/d" policy_generation.log
    sed -i "/Resolved policies for/d" policy_generation.log
    
    # Policy status comparision
    sed -i 's/.*: //' computePolicyStatus.log
    sed -i 's/ms//' computePolicyStatus.log
    
    sed -i 's/.*: //' getNodeStatusReports.log
    sed -i 's/ms//' getNodeStatusReports.log
    
    sed -i 's/.*: //' complianceReports.log
    sed -i 's/ms//' complianceReports.log
    
    sed -i 's/.*: //' cacheExpire.log
    sed -i 's/ms//' cacheExpire.log
    
    

(unfortunately global cost and Compute complianceForRun don't exist in 5.1)

Result for getNodeStatusReports
Version Number of entries Sum Average Mode Min Max
5.0 152087 13773 11.0424 2 0 532
5.1 17822 9146 1.94861 1 0 461
Result for computePolicyStatus
Version Number of entries Sum Average Mode Min Max
5.0 5397 13773 0.391854 0 0 341
5.1 3597 9146 0.393287 0 0 541
Result for complianceReports
Version Number of entries Sum Average Mode Min Max
5.0 317721 126 2521.6 6 3 165463
5.1 48765 50 975.3 12 0 24429
Result for cacheExpire
Version Number of entries Sum Average Mode Min Max
5.0 350287 3929 89.1542 88 76 171
5.1 58129 562 103.432 95 5 175

Note: resultats for 5.0 are on a full night, 5.1 is on couple of hours, so it might skews the result a bit, but using only half a day for 5.0 show even worse stats for 5.0

Actions #7

Updated by François ARMAND about 5 years ago

After much more tests, the performance problem seems to lie in `toBox` and/or `toIO` (from box).
I'm not sure I undertand why, but avoiding that perf killer in hot spot leads to as good as 5.0 perf (or even better in some cases).

The hottest point is evalJS, which is done one time per directive param. Porting that to IOResult leads to excellent results.

For ex, in 5.0 I have:

===> Total out: 374 ms
===> Total in : 368 ms
===> Total exp: 34 ms

Run nodeconfigProg          : 129 ms

In 6.0 with evalJS in Box:

===> Total out: 3954 ms
===> Total in : 3904 ms
===> Total exp: 3234 ms
Run nodeconfigProg          : 4793 ms

And with evalJS in IORestult:

===> Total out: 270 ms
===> Total in : 261 ms
===> Total exp: 21 ms

config : 187 ms
Actions #8

Updated by François ARMAND about 5 years ago

  • Status changed from Pending technical review to In progress
  • Assignee changed from Nicolas CHARLES to François ARMAND
Actions #9

Updated by François ARMAND about 5 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Pull Request changed from https://github.com/Normation/rudder/pull/2486 to https://github.com/Normation/rudder/pull/2529
Actions #10

Updated by François ARMAND about 5 years ago

  • Status changed from Pending technical review to Pending release
Actions #11

Updated by Vincent MEMBRÉ about 5 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 6.0.0~beta1 which was released today.

Actions

Also available in: Atom PDF