Project

General

Profile

Actions

Bug #19213

closed

Put node cache info into a file in place of LDAP

Added by François ARMAND almost 3 years ago. Updated almost 2 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
Checked
Regression:

Description

Today, we have an entry in LDAP (cn=Nodes Configuration,ou=Rudder,cn=rudder-configuration)

In big environments (10K+ nodes with lots of directives applied), this can lead to connection timeout regarding node cache, either when cleared:

[2021-05-03 10:33:01] ERROR policy.generation.manager - Error when updating policy, reason was: Cannot clean the configuration cache <- BackendException: Error when doing action 'modify' with and LDIF change request: null; cause was: com.unboundid.ldap.sdk.LDAPException: A client-side timeout was encountered while waiting 30214ms for a response to modify request with message ID 19207 for entry 'cn=Nodes Configuration,ou=Rudder,cn=rudder-configuration' from server localhost:389.
 -> com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyModify$2(LDAPConnection.scala:579)
 -> com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyMod$1(LDAPConnection.scala:514)
[2021-05-03 10:33:01] ERROR policy.generation.manager - Policy update error for process '135874' at 2021-05-03 10:33:01: Cannot clean the configuration cache

Or when updated:

[2021-05-03 10:32:22] ERROR inventory-processing - Error when trying to process report: Can't merge inventory report in LDAP directory, aborting; cause was: Exception when commiting inventory, abort.; cause was: BackendException: Error when doing action 'modify' with and LDIF change request: null; cause was: com.unboundid.ldap.sdk.LDAPException: A client-side timeout was encountered while waiting 30000ms for a response...

The problem is that that entry is VERY big: for each node, it contains one hash of important version info of node directives, properties, etc. So it's growing linearly by node, and quadraticly by directive applied.

This information is related to policy generation and should not be in LDAP, which is our base for stable information like policy configuration and node inventories.

We need to put that entry somewhere else, either in postgres (but we don't have a convenient table for that, or at least it would mean quite a big change for a patch release) or in a json file in the fs (which could be a direct port of current access/update/modify logic but on a place where connection timeouts don't happens and don't impact other part of the app).

Workaround

You can make the timeout disapear by adding in /etc/default/rudder-jetty, uncomment JAVA_OPTIONS= line and set:

JAVA_OPTIONS="-Dcom.unboundid.ldap.sdk.LDAPConnectionOptions.defaultModifyResponseTimeoutMillis=300000" 

(or if you already have JAVA_OPTIONS, add the new one after a space)

And restart rudder.


Subtasks 1 (0 open1 closed)

Bug #19221: isWritable name change in version of betterfiles in 6.2ReleasedVincent MEMBRÉActions

Related issues 1 (0 open1 closed)

Related to Rudder - Bug #19589: NoSuchFileException: /var/rudder/policy-generation-info/node-configuration-hashes.jsonReleasedElaad FURREEDANActions
Actions

Also available in: Atom PDF