Project

General

Profile

Actions

Bug #19213

closed

Put node cache info into a file in place of LDAP

Added by François ARMAND almost 3 years ago. Updated almost 2 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
Checked
Regression:

Description

Today, we have an entry in LDAP (cn=Nodes Configuration,ou=Rudder,cn=rudder-configuration)

In big environments (10K+ nodes with lots of directives applied), this can lead to connection timeout regarding node cache, either when cleared:

[2021-05-03 10:33:01] ERROR policy.generation.manager - Error when updating policy, reason was: Cannot clean the configuration cache <- BackendException: Error when doing action 'modify' with and LDIF change request: null; cause was: com.unboundid.ldap.sdk.LDAPException: A client-side timeout was encountered while waiting 30214ms for a response to modify request with message ID 19207 for entry 'cn=Nodes Configuration,ou=Rudder,cn=rudder-configuration' from server localhost:389.
 -> com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyModify$2(LDAPConnection.scala:579)
 -> com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyMod$1(LDAPConnection.scala:514)
[2021-05-03 10:33:01] ERROR policy.generation.manager - Policy update error for process '135874' at 2021-05-03 10:33:01: Cannot clean the configuration cache

Or when updated:

[2021-05-03 10:32:22] ERROR inventory-processing - Error when trying to process report: Can't merge inventory report in LDAP directory, aborting; cause was: Exception when commiting inventory, abort.; cause was: BackendException: Error when doing action 'modify' with and LDIF change request: null; cause was: com.unboundid.ldap.sdk.LDAPException: A client-side timeout was encountered while waiting 30000ms for a response...

The problem is that that entry is VERY big: for each node, it contains one hash of important version info of node directives, properties, etc. So it's growing linearly by node, and quadraticly by directive applied.

This information is related to policy generation and should not be in LDAP, which is our base for stable information like policy configuration and node inventories.

We need to put that entry somewhere else, either in postgres (but we don't have a convenient table for that, or at least it would mean quite a big change for a patch release) or in a json file in the fs (which could be a direct port of current access/update/modify logic but on a place where connection timeouts don't happens and don't impact other part of the app).

Workaround

You can make the timeout disapear by adding in /etc/default/rudder-jetty, uncomment JAVA_OPTIONS= line and set:

JAVA_OPTIONS="-Dcom.unboundid.ldap.sdk.LDAPConnectionOptions.defaultModifyResponseTimeoutMillis=300000" 

(or if you already have JAVA_OPTIONS, add the new one after a space)

And restart rudder.


Subtasks 1 (0 open1 closed)

Bug #19221: isWritable name change in version of betterfiles in 6.2ReleasedVincent MEMBRÉActions

Related issues 1 (0 open1 closed)

Related to Rudder - Bug #19589: NoSuchFileException: /var/rudder/policy-generation-info/node-configuration-hashes.jsonReleasedElaad FURREEDANActions
Actions #1

Updated by François ARMAND almost 3 years ago

  • Status changed from New to In progress
Actions #2

Updated by François ARMAND almost 3 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Vincent MEMBRÉ
  • Pull Request set to https://github.com/Normation/rudder/pull/3615
Actions #3

Updated by François ARMAND almost 3 years ago

  • Status changed from Pending technical review to Pending release
Actions #4

Updated by François ARMAND almost 3 years ago

  • Fix check changed from To do to Checked

Correction works great. Tested with removing only one node, corrupting file, deleting file, etc.

Actions #5

Updated by Vincent MEMBRÉ almost 3 years ago

This bug has been fixed in Rudder 6.1.13 and 6.2.7 which were released today.

Actions #6

Updated by Vincent MEMBRÉ almost 3 years ago

  • Status changed from Pending release to Released
Actions #7

Updated by François ARMAND over 2 years ago

  • Related to Bug #19589: NoSuchFileException: /var/rudder/policy-generation-info/node-configuration-hashes.json added
Actions

Also available in: Atom PDF