Bug #19213
closedPut node cache info into a file in place of LDAP
Description
Today, we have an entry in LDAP (cn=Nodes Configuration,ou=Rudder,cn=rudder-configuration
)
In big environments (10K+ nodes with lots of directives applied), this can lead to connection timeout regarding node cache, either when cleared:
[2021-05-03 10:33:01] ERROR policy.generation.manager - Error when updating policy, reason was: Cannot clean the configuration cache <- BackendException: Error when doing action 'modify' with and LDIF change request: null; cause was: com.unboundid.ldap.sdk.LDAPException: A client-side timeout was encountered while waiting 30214ms for a response to modify request with message ID 19207 for entry 'cn=Nodes Configuration,ou=Rudder,cn=rudder-configuration' from server localhost:389. -> com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyModify$2(LDAPConnection.scala:579) -> com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyMod$1(LDAPConnection.scala:514) [2021-05-03 10:33:01] ERROR policy.generation.manager - Policy update error for process '135874' at 2021-05-03 10:33:01: Cannot clean the configuration cache
Or when updated:
[2021-05-03 10:32:22] ERROR inventory-processing - Error when trying to process report: Can't merge inventory report in LDAP directory, aborting; cause was: Exception when commiting inventory, abort.; cause was: BackendException: Error when doing action 'modify' with and LDIF change request: null; cause was: com.unboundid.ldap.sdk.LDAPException: A client-side timeout was encountered while waiting 30000ms for a response...
The problem is that that entry is VERY big: for each node, it contains one hash of important version info of node directives, properties, etc. So it's growing linearly by node, and quadraticly by directive applied.
This information is related to policy generation and should not be in LDAP, which is our base for stable information like policy configuration and node inventories.
We need to put that entry somewhere else, either in postgres (but we don't have a convenient table for that, or at least it would mean quite a big change for a patch release) or in a json file in the fs (which could be a direct port of current access/update/modify logic but on a place where connection timeouts don't happens and don't impact other part of the app).
Workaround¶
You can make the timeout disapear by adding in /etc/default/rudder-jetty
, uncomment JAVA_OPTIONS=
line and set:
JAVA_OPTIONS="-Dcom.unboundid.ldap.sdk.LDAPConnectionOptions.defaultModifyResponseTimeoutMillis=300000"
(or if you already have JAVA_OPTIONS
, add the new one after a space)
And restart rudder.
Updated by François ARMAND over 3 years ago
- Status changed from New to In progress
Updated by François ARMAND over 3 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Vincent MEMBRÉ
- Pull Request set to https://github.com/Normation/rudder/pull/3615
Updated by François ARMAND over 3 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|3468f292d643bc33309f82e8ed1afb4b7b5611fe.
Updated by François ARMAND over 3 years ago
- Fix check changed from To do to Checked
Correction works great. Tested with removing only one node, corrupting file, deleting file, etc.
Updated by Vincent MEMBRÉ over 3 years ago
This bug has been fixed in Rudder 6.1.13 and 6.2.7 which were released today.
Updated by Vincent MEMBRÉ over 3 years ago
- Status changed from Pending release to Released
Updated by François ARMAND over 3 years ago
- Related to Bug #19589: NoSuchFileException: /var/rudder/policy-generation-info/node-configuration-hashes.json added