Bug #4242
closedPromise generation takes too long when getting more and more nodes
Description
Hello,
On my 130 managed nodes Rudder server, rules generation takes up to 6mn30s.
Here is a quick run example:
Success: Rules applied at 2013-12-11 15:14 (took 5 min 40s)
To be honest it starts to make the new server provisioning workflow slow because of this.
A new server has to wait around 10mn before being able to apply rudder rules which are
From what i understand, cf-promises is run on all nodes id directories which is ~1.2s here:
/usr/bin/time /opt/rudder/bin/cf-promises -f /var/rudder/share/a17dc652-4f84-4fa2-8c97-26e15e6f42b3/rules/cfengine-community/promises.cf 1.15user 0.04system 0:01.22elapsed 98%CPU (0avgtext+0avgdata 25248maxresident)k 0inputs+16outputs (0major+8468minor)pagefaults 0swaps
While not being the only culprit here, it takes a good 2minutes to run over all 130 nodes. Wouldn't that be faster to only run it on one fully generated rule and then simply copy promise files over their id directories?
Should i try to increase LDAP cache - 1GB by default - or increase JVM heap size - 1GB also?
Let me know if you need more informations.
Olivier
Updated by Olivier Mauras almost 11 years ago
By setting
rudder.community.checkpromises.command=/bin/trueit now gives me
Success: Rules applied at 2013-12-11 16:27 (took 2 min 46s)
Updated by François ARMAND almost 11 years ago
Hello,
I suspect one part of the generation to take a lot of time, related to putting things in LDAP. Could you please run in debug and copy here for the generation timing logs? It should be something like:
[2013-12-11 16:55:27] DEBUG com.normation.rudder.services.servers.NodeConfigurationServiceImpl - Written in ldap the node configuration caches in 134 millisec .... .... .... [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Reports updated in 572ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Timing summary: [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Fetch all information : 361 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Historize names : 1333 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Build current rule vals : 185 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Expand rule parameters : 237 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Build target configuration: 400 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Update rule vals : 587 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Increment rule serials : 274 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Write node configurations : 1576 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Save expected reports : 572 ms [2013-12-11 16:55:28] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Deployment completed in 5667 millisec
And if I guess correctly, the "written in LDAP" (really good english, thought) and the "write node config" should take a lot of time.
Thanks,
Updated by Olivier Mauras almost 11 years ago
timing summary:
[2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Timing summary: [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Fetch all information : 218 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Historize names : 143 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Build current rule vals : 37 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Expand rule parameters : 282 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Build target configuration: 38300 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Update rule vals : 263 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Increment rule serials : 160 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Write node configurations : 103092 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Save expected reports : 20477 ms [2013-12-11 17:06:37] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Deployment completed in 163102 millisec
Updated by Olivier Mauras almost 11 years ago
With ldap directory on tmpfs:
[2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Timing summary: [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Fetch all information : 205 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Historize names : 112 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Build current rule vals : 36 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Expand rule parameters : 259 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Build target configuration: 39686 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Update rule vals : 258 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Increment rule serials : 114 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Write node configurations : 152479 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Save expected reports : 9591 ms [2013-12-11 17:28:53] DEBUG com.normation.rudder.services.policies.DeploymentServiceImpl - Deployment completed in 202853 millisec
Not better at all :(
Updated by Matthieu CERDA almost 11 years ago
- Category set to Web - Config management
- Status changed from New to 8
- Assignee set to François ARMAND
- Target version set to 2.6.11
Qualified as an important bug, assigning to FAR.
Updated by François ARMAND almost 11 years ago
Some more thought on that one.
There is two parts on that problem, one related to storage of generated node coniguration, the other linked to the validation of generated promises with cf-promises.
Say in that one, we are going to manage the storage of configuraiton, and see #4427 for the cf-promise part.
So, the problem: we store all the node configuration in our LDAP base. We need them in a "snapshot" way, so that 1/ we are sure that the configuration is not modifed along the generation process, 2/we can debug by seeing what the Rudder-vision of the node configuration was, and especially 3/ we are able to decide if a node configuration was modified compared to the last generation round, so that we only process actually updated node configuration further down the promise generation algorithm.
A node configuration is something big, and composed of a lot of domain objects :
- node variables (hostname and the like),
- Rudder parameters,
- system variables,
- and of course, directives applied to that node with (possibly) specific parameters.
For a node with 50 directives, that leads to 101 LDAP objects (one big). So, for 100 nodes, that leads to 10 000 LDAP objects.
In the case that we don't have any luck, we have to rewrite ALL these object, and LDAP is not really good for that.
So, the solution is to change the node configuration data model. What we really need is the 3/ above (be able to decide if a node configuration is updated). For that, we don't need all the actual node configuration, some kind of hash can be sufficient.
But in that case, we loose 2/, the ability to debug. The good answer to that is to have a runtime configurable switch that will dump node coniguration in a human readable format, only when we want to debug - no need to impact performance with something not used in the general case.
And for 1/, the promise generation algorithm has dramatically changed since Rudder 2.1, and is mostly based on immutable, in-memory object now. So we "just" have to finish the transformation, and since the whole process is mono-threaded, use these in-memory objects to forbid any race-condition-like problem.
Updated by François ARMAND almost 11 years ago
Work in progress: https://github.com/fanf/rudder/tree/proto/remove_node_configuration
Everything compile and tests pass, need some cleaning and a more clean way to enable the debuging.
First performance tests are REALLY good (from tens of seconds to less than one second for that part). Need some more use case scenario to check.
Updated by François ARMAND almost 11 years ago
- Assignee changed from François ARMAND to Nicolas CHARLES
A first version of pull requests are available here:
- https://github.com/Normation/cf-clerk/pull/32
- https://github.com/Normation/rudder/pull/435
They are quite heavy, so Nico, let's talk about them!
Updated by François ARMAND almost 11 years ago
- Target version changed from 2.6.11 to 2.10.0~beta1
Seing the impact on code, it can't be a simple bug correction, but should go in the next version.
We are currently looking for workarounds limiting the timing for earlier version of Rudder.
Updated by François ARMAND almost 11 years ago
For the record, with first testing data, with 120 nodes and around 50 directives on my testing machine we have:
- before: around 70 second JUST FOR SAVING IN LDAP
- now: less than 500 ms
It was around 50 ms for ~20 nodes, so it seems to gain something like 3-5 ms by node almost lineary with nodes, and with little impact for the number of directive.
Updated by François ARMAND almost 11 years ago
- Pull Request set to https://github.com/Normation/rudder/pull/435
Updated by Nicolas CHARLES over 10 years ago
- Assignee changed from Nicolas CHARLES to François ARMAND
Francois, I've made a remark about the file name, that could be a constant;
otherwise that's good for me !
Thank you
Updated by François ARMAND over 10 years ago
- Status changed from 8 to Pending release
- % Done changed from 0 to 100
Applied in changeset 49ed57fa27581e58a7f71efd0494b12d4d0be864.
Updated by Nicolas CHARLES over 10 years ago
Applied in changeset 9b6f114c07af3f801672971b9d2fba733ef0b849.
Updated by Nicolas CHARLES over 10 years ago
- Status changed from Pending release to Pending technical review
Francois, I've added a PR that add error message when something can go bad for two step of deployment
https://github.com/Normation/rudder/pull/456
Updated by Nicolas CHARLES over 10 years ago
- Status changed from Pending technical review to Pending release
remove the second PR to #4583
Updated by Vincent MEMBRÉ over 10 years ago
- Tracker changed from Bug to User story
- Subject changed from Rules generation is too long to Speed up promises generation by optimization on LDAP requests
- Category changed from Web - Config management to Performance and scalability
Updated by Vincent MEMBRÉ over 10 years ago
- Tracker changed from User story to Bug
- Pull Request set to https://github.com/Normation/rudder/pull/435
Updated by Vincent MEMBRÉ over 10 years ago
- Subject changed from Speed up promises generation by optimization on LDAP requests to Bad usage of ldap requests during promise generation make it very long
Updated by Vincent MEMBRÉ over 10 years ago
- Subject changed from Bad usage of ldap requests during promise generation make it very long to Promise generation takes too long when getting more and more nodes
Updated by Vincent MEMBRÉ over 10 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 2.10.0~beta1, which was released today.
Check out:
The release announcement: http://www.rudder-project.org/pipermail/rudder-announce/2014-March/000084.html
The full ChangeLog: http://www.rudder-project.org/foswiki/bin/view/System/Documentation:ChangeLog210
Download information: https://www.rudder-project.org/site/get-rudder/downloads/