Project

General

Profile

Actions

Bug #6780

closed

Node not included in dynamic group due to openldap bug with modrdn not showing node children

Added by Dennis Cabooter almost 9 years ago. Updated about 8 years ago.

Status:
Released
Priority:
N/A
Category:
Server components
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

Some queries are sometimes (seems only on Centos/RedHat) not returning the proper list of nodes
The cause is the idlcache of openldap, which store the cache of ONE and SUB queries for entries. However, if we move an entry (with modrdn) with subchildren, it fails to properly update the subchildren in the cache - hene no result
Deactivating the idlcache, or restarting slapd circumvent the issue

A patch was proposed by Jon (attached), and a ticket opened at on OpenLDAP bug tracker ( http://www.openldap.org/its/index.cgi/Incoming?id=8378 )

Below, old ticket description

Some times a week The Rudder web interface stops working properly. Searching times out and the pie charts on the dashboard don't appear. There is no error in slapd.log, but there is an indication that LDAP is the culprit in the webapp logs:

[2015-06-22 14:59:21] ERROR com.normation.ldap.sdk.ROPooledSimpleAuthConnectionProvider - Can't execute LDAP request
com.unboundid.ldap.sdk.LDAPSearchException: A client-side timeout was encountered while waiting 300000ms for a response to search request with message ID 3, base DN 'cn=rudder-configuration', scope SUB, and filter '(&(|(objectClass=rudderNode)(&(objectClass=node)(entryDN:dnOneLevelMatch:=ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration)))(modifyTimestamp>=20150622100542.001Z))' from server localhost:389.

As Nicolas Charles suggested on IRC, I stopped rudder-jetty and rudder-slapd. However, rudder-jetty is started automagically within 5 minutes and i had to forcestop slapd. Then i reindexed LDAP and started the services. The problem didn't go away unfortunatly.


Files

slapd.conf (2.89 KB) slapd.conf slapd Nicolas CHARLES, 2015-06-30 10:13
DB_CONFIG (580 Bytes) DB_CONFIG db_config Nicolas CHARLES, 2015-06-30 10:13
networkInterface (8.22 KB) networkInterface Nicolas CHARLES, 2015-06-30 10:14
openldap-dn2id-modrdn-idlcache-sub.patch (1.27 KB) openldap-dn2id-modrdn-idlcache-sub.patch Nicolas CHARLES, 2016-02-23 08:38

Subtasks 2 (0 open2 closed)

Bug #7965: Apply patches on openldap in Rudder >= 3.0ReleasedJonathan CLARKE2016-02-23Actions
Bug #8000: Broken LDAP on Rudder nightlyReleasedAlexis Mousset2016-02-29Actions

Related issues 1 (0 open1 closed)

Related to Rudder - Bug #6931: Update OpenLDAP to 2.4.41ReleasedVincent MEMBRÉ2015-07-06Actions
Actions #1

Updated by Nicolas CHARLES almost 9 years ago

  • Assignee changed from Nicolas CHARLES to François ARMAND

i'm assigning this issue to Francois, as he's more literate in ldap than I am

Updated by Nicolas CHARLES almost 9 years ago

I'm having a probably related issue: on a VM, on Centos6, when the server has been used quite a lot, the groups don't get updated when accepting a node.
This is not related to the webapp (i connected from a remote webapp to double check this point). It simply seems that the ldap is not taking into account new entries in complex search (with regex at least)

I lost the webapp trace from ldap, but I'm searching from Linux node, with Centos OS, and IP Adress matching regex 192.168.42.1[3|4].
Accepting a node with ip 192.168.42.13 didn't update the group, and it shows in the webapp trace (it simply didn't came find the network interface 192.168.42.13

Rebooting the server solves the issue, as well as reindexing the ldap.

Some info:

[root@server openldap-data]# ls -al
total 108624
drwxr-xr-x 2 root root       4096 30 juin  10:00 .
drwxr-xr-x 5 root root       4096  4 juin  17:53 ..
-rw-r--r-- 1 root root       4096 30 juin  10:00 alock
-rw------- 1 root root      24576 30 juin  10:00 __db.001
-rw------- 1 root root   65896448 30 juin  10:08 __db.002
-rw------- 1 root root 1073741824 30 juin  10:08 __db.003
-rw------- 1 root root    2359296 30 juin  10:07 __db.004
-rw------- 1 root root   35831808 30 juin  10:08 __db.005
-rw------- 1 root root      32768 30 juin  10:01 __db.006
-rw-r--r-- 1 root root        580  3 juin  08:28 DB_CONFIG
-rw------- 1 root root     372736 23 juin  18:48 dn2id.bdb
-rw------- 1 root root    2228224 24 juin  13:35 id2entry.bdb
-rw------- 1 root root   10485760 30 juin  10:01 log.0000000003
-rw------- 1 root root      65536 23 juin  18:48 objectClass.bdb

and attached my DB_CONFIG and slapd.conf, as well as a simple ldap research

Actions #3

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 3.0.7 to 3.0.8
Actions #4

Updated by Nicolas CHARLES almost 9 years ago

Some more data:
i can reproduce my bug (wrong search) quite reliably. I need to accept a node after having burdened a bit ldap

What is this bug:
  1. search directly on nodes work ok (on hostname, OS, primary ip adresse)
  2. search combine with sub element (mountpoint, networkinterface) fail to find the new node
  3. search combine with hardware (bios name) fail
  4. Node does show up on the web interface, with complete inventory

request that work and don't works

ldapsearch -H ldap://localhost:389/ -D cn=manager,cn=rudder-configuration -w 3cfd665c2c8b -b 'ou=Machines,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration' -s sub '(&(objectClass=machine)(objectClass=*))' 1.1 motherBoardUuid
Correcly finds the new node

ldapsearch -H ldap://localhost:389/ -D cn=manager,cn=rudder-configuration -w 3cfd665c2c8b -b 'networkInterface=eth1,nodeId=11c74371-062c-4b9d-a089-d28cc1b5f637,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration' -s base
correctly find the network interface

ldapsearch -H ldap://localhost:389/ -D cn=manager,cn=rudder-configuration -w 3cfd665c2c8b -b 'nodeId=11c74371-062c-4b9d-a089-d28cc1b5f637,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration' -s one '(&(objectClass=networkInterfaceLogicalElement)(objectClass=*))'
correctly find the network interface

ldapsearch -H ldap://localhost:389/ -D cn=manager,cn=rudder-configuration -w 3cfd665c2c8b -b 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration' -s sub '(&(objectClass=networkInterfaceLogicalElement)(objectClass=*))' 1.1 ipHostNumber
Does not find the network interface

ldapsearch -H ldap://localhost:389/ -D cn=manager,cn=rudder-configuration -w 3cfd665c2c8b -b 'ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration' -s sub '(&(objectClass=networkInterfaceLogicalElement)(objectClass=*))' 1.1 ipHostNumber
Does not find the network interface

ldapsearch -H ldap://localhost:389/ -D cn=manager,cn=rudder-configuration -w 3cfd665c2c8b -b 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration' -s sub 'networkInterfaceMacAddress=08:00:27:06:97:21'
does not find the network interface

ldapsearch -H ldap://localhost:389/ -D cn=manager,cn=rudder-configuration -w 3cfd665c2c8b -b 'nodeId=11c74371-062c-4b9d-a089-d28cc1b5f637,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration' -s sub '(&(objectClass=networkInterfaceLogicalElement)(objectClass=*))' 1.1 ipHostNumber
finds the network interface

Weird, huh ?

Actions #5

Updated by Nicolas CHARLES almost 9 years ago

 uname -a
Linux server 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Centos 6.3

Actions #6

Updated by Nicolas CHARLES almost 9 years ago

restarting rudder-slapd solves the issue

Actions #7

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 3.0.8 to 3.0.9
Actions #8

Updated by Nicolas CHARLES over 8 years ago

  • Related to Bug #6931: Update OpenLDAP to 2.4.41 added
Actions #9

Updated by Nicolas CHARLES over 8 years ago

  • Status changed from New to Rejected

This was probably sovled by #6931 ; rejecting this ticket
Dennis, if the issue still occurs, feel free to reopen this ticket

Thank you !
Nicolas

Actions #10

Updated by Nicolas CHARLES over 8 years ago

Ha, i still have search error

@(#) $OpenLDAP: slapd 2.4.41 (Dec 10 2015 03:57:37) $
    root@centos-builder-6-64.labo.normation.com:/usr/src/rudder-packages/package/BUILD/openldap-source/servers/slapd

a query on Operating system type linux + operating system name centos ++ ip address regex 192.168.42.1[3|4] fails at finding a given node
This specific node acceptation was super slow (more than 1 minutes without anything)

I do not have any error in the logs

replacing regex by ip == 192.168.42.14 does not work

searching Network Interface IP address is defined doesn't return my new node

restarting ldap solves the issue

Actions #11

Updated by Nicolas CHARLES over 8 years ago

the server is a centos6 server, which as been upgraded fro m 3.0~alpha1 to 3.15, in a lot of different phases

Actions #12

Updated by Nicolas CHARLES about 8 years ago

  • Status changed from Rejected to New
  • Target version changed from 3.0.9 to 3.0.14

I'm reviving this ticket, as it's been seen reapparing in the wild

Actions #13

Updated by François ARMAND about 8 years ago

  • Subject changed from The Rudder web interface fails on LDAP to Node not included in dynamic group due to openldap bug with modrdn not showing node children
  • Status changed from New to In progress

So, it appears again in CentOS 7 only, not sur why. Hard to reproduce, but we did it!

It appears that it seems to be a bug with modrdn on hdb that don't update the children grand-parents relation when moved, so the children still think they are on the old place, which explains why a search on the new grand parent does not show the children, but one directly on the parent DOES show them.

The bug seems to be related to idlcachesize, because setting it 0 make the problem not reproducible.

It seems that the last version of openldap (2.4.44) does not have the problem, but more confirmation need to be done since nothing in the changelog between our version (2.4.41) and that one is related.

Actions #14

Updated by Nicolas CHARLES about 8 years ago

Actions #16

Updated by Alexis Mousset about 8 years ago

  • Assignee changed from Benoît PECCATTE to Alexis Mousset
Actions #17

Updated by Alexis Mousset about 8 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Alexis Mousset to Jonathan CLARKE
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/886
Actions #18

Updated by Alexis Mousset about 8 years ago

  • Status changed from Pending technical review to Pending release
  • % Done changed from 0 to 100
Actions #19

Updated by Vincent MEMBRÉ about 8 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 2.11.19, 3.0.14, 3.1.8 and 3.2.1 which were released today.

Actions

Also available in: Atom PDF