Deleted node should be periodically fully erased in LDAP (after some ttl)
When you delete a node from Rudder, the inventories in the LDAP Backend are not deleted, just moved to the "ou=Removed Inventories". Having this behavior will basically pile up all ever accepted nodes with time, with currently no real way (except deleting them directly in LDAP, which is not nice) to either permanently delete them or to have a housekeeping on the entries to "age out" after X amount of time.My proposals:
- Add an API Feature to permanently delete the nodes from Rudder
- Implement some kind of housekeeping that can be activated to delete removed entries older than X (like you currently have a TTL for reports)
Updated by Jonathan CLARKE over 4 years ago
I agree it would be nice to have a mechanism to delete old nodes for housekeeping purposes.
However it should be noted that at chest glance the design of OpenLDAP very much reduces any negative impact unused entries may have:
- The total number of entries in the directory can be in the millions before OpenLDAP will show slowdown, for example.
- OpenLDAP caches are only loaded with entries and search results that are actually used, so unless you query the deleted nodes, they will never enter the cache (which is where all results should be served from)
- We don't use any indexes (apart from the mandatory objectClass which is just one entry per object) so the unused entries will not weigh in here either (see https://github.com/Normation/rudder-packages/blob/master/rudder-inventory-ldap/SOURCES/slapd.conf#L52)
However, I see that we warm up the cache by reading all entries in https://github.com/Normation/rudder-packages/blob/master/rudder-inventory-ldap/SOURCES/rudder-inventory-ldap.init#L445, including these deleted entries. That is a waste of cache space. We should change the warmup script.
Updated by Janos Mattyasovszky over 4 years ago
Depends on how you see it, but currently there is a potential to have thousands of dead entries every year, and after a couple of years the removed would outweight the active entries by almost 2:1...
Having them in LDAP does not give any benefit against not having them, which OTOH would speed up startup (as already mentioned) and not to talk about backup/restore times...
Our current slapcat-based backup is uncompressed 1.6GB big, and takes quite some minutes to make, so having BACKUP_AT_SHUTDOWN="1" in the rudder-slapd also makes me question it if this backup is right in that place to be performed at basically each slapd restart, taking a lot of extra minutes to restart slapd.
I don't say slapd is not good to have millions of entries, just the current usage is not a very optimal one...
I'll do some testing tomorrow to see how much space/time these deleted inventories actually take up in % of disk space of backup / time for backup.