Project

General

Profile

Actions

Bug #12937

closed

In Rudder 6.2.0 inventory processing merge_uuid part get extremelly slow on debian

Added by Nicolas CHARLES over 5 years ago. Updated almost 2 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Priority:
77
Name check:
Fix check:
Checked
Regression:

Description

After a time, inventory processing on a rudder which manage debian system get extremelly slow - up to several minutes by inventory, versus the expected <1s duration. This cause chain problem with the accumulation of to-be-processed inventories in /var/rudder/inventories/incoming and /var/rudder/inventories/accepted-nodes-updates.

The main observable consequence is new nodes taking several hours to appear in "pending node" in UI.

Analysis:

This was an old ticket which got a brand new hype in 6.2. The underlying root cause was that the number of software was growing linearly with the number of debian-like inventory processed (because of bug on a check condition on the "source name" and "source version" attribute, only present on debian-like since 6.2).

After a time, the huge number of software (in the million range for a middle-sized infra of just 500 linux, vs ~50k in normal operationnal condition, so in the range of a 10k node installation, without the corresponding sized root server) was exacerbing the `merge_uuid` action for them.

The PR correcting that ticket also add a new API endpoint allowing to clean-up unused software entries (see doc API for system/maintenance/purgeSoftware). The purge is triggered every night, but you might want to trigger it after install rudder 6.2.1 to start diminishing the number of garbage software as soon as possible.

History:

With Rudder 4.1, some inventories are painfully slow to process
Checking the logs, in trace, it shows

[2018-07-11 16:50:35] DEBUG com.normation.inventory.services.provisioning.NodeInventoryDNFinderService - Server Id 'NodeId(YYYYYYY-YYYYYYY-YYYYYYY-YYYYYYY-YYYYYYY)' found in DIT 'AcceptedInventory' with id finder 'use_existing_id'
[2018-07-11 16:50:35] DEBUG com.normation.inventory.services.provisioning.MachineDNFinderService - Processing machine id finder use_existing_id
[2018-07-11 16:50:35] DEBUG com.normation.inventory.services.provisioning.MachineDNFinderService - Processing machine id finder check_mother_board_uuid_accepted
[2018-07-11 16:50:35] DEBUG com.normation.inventory.services.provisioning.MachineDNFinderService - Machine Id 'MachineUuid(XXXXX-XXXXX-XXXXX-XXXXX-XXXXX)' found with id finder 'check_mother_board_uuid_accepted'
[2018-07-11 16:50:35] TRACE com.normation.inventory.ldap.provisioning.DefaultReportSaver - Precommit 'pre_commit_inventory:merge_uuid': 30645 ms
[2018-07-11 16:50:35] TRACE com.normation.inventory.ldap.provisioning.DefaultReportSaver - Precommit 'pre_commit_inventory:check_machine_cn': 0 ms
[2018-07-11 16:50:35] TRACE com.normation.inventory.ldap.provisioning.DefaultReportSaver - Precommit 'pre_commit_inventory:set_last_inventory_date': 0 ms
[2018-07-11 16:50:35] TRACE com.normation.inventory.ldap.provisioning.DefaultReportSaver - Precommit 'pre_commit_inventory:add_ip_values': 0 ms
[2018-07-11 16:50:35] TRACE com.normation.inventory.ldap.provisioning.DefaultReportSaver - Precommit 'pre_commit_inventory:log_inventory': 0 ms
[2018-07-11 16:50:35] TRACE com.normation.inventory.ldap.provisioning.DefaultReportSaver - Pre commit report: 30647 ms

There is no reason for such check to be so slow.

WORKAROUND

It may helps to delete unreferenced software, and perhaps the batch that should do that is in error (you can grep unreferenced /var/log/rudder/webapp/* to see if there is errors or a lot of them)

You can do it manually:

So, first, MAKE A BACKUP OF YOUR LDAP: https://docs.rudder.io/reference/6.2/administration/procedures.html#_migration_backups_and_restores

Then, create de directory for work, and go in it. Then:

- create a directory for work, and go in it. Then:
- all softwares:

ldapsearch -LLL -o ldif-wrap=no -h localhost -p 389 -x -D "cn=Manager,cn=rudder-configuration" -w LDAP_PASS_FROM_rudder-passwords_file -b "ou=Software,ou=Inventories,cn=rudder-configuration" -s one 1.1 | sort | uniq | cut -d: -f2 > all-soft-sorted.dns

- softwares used in accepted nodes:
ldapsearch -LLL -o ldif-wrap=no -h localhost -p 389 -x -D "cn=Manager,cn=rudder-configuration" -w LDAP_PASS -b "ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration" -s one 'nodeId=*' software | grep softwareId | sort | uniq | cut -d: -f2 > nodes-soft-sorted.dns

- software to delete (ie all - nodes):
grep -f nodes-soft-sorted.dns -v all-soft-sorted.dns > soft-to-delete.dns

- then delete unreferenced:
ldapdelete -h localhost -p 389 -x -D "cn=Manager,cn=rudder-configuration" -w LDAP_PASS -c -f soft-to-delete.dns


Subtasks 1 (0 open1 closed)

Bug #18830: Missing doc for purgeSoftware API endpointReleasedAlexis MoussetActions

Related issues 2 (0 open2 closed)

Related to Rudder - Architecture #17128: review index for LDAPReleasedFrançois ARMANDActions
Related to Rudder - Bug #18873: purge software batch sometime terminate in error without log messageReleasedVincent MEMBRÉActions
Actions

Also available in: Atom PDF