Monitor unreference software size
It happens that the unreference software batch is not always working (see long discussion here: https://gitter.im/normation/rudder?at=5ff7680c787d8f79c8e9e292 and the couple of previsous hours). In that case, we observe very long inventory processing (https://issues.rudder.io/issues/12937).
So, we should monitor the size of unreference softwares. For that, and so that we also have an actionable debug lever to provide, I propose to:
- add in
deleteUnreferencedSoftware method in
SoftwareServiceImpl a step that write the list of unreference software( "
extraSoftware") in a file (for ex:
/var/rudder/metrics/software/unreference-software.txt), on dn by line, overwritten at each run of the batch,
- monitor number of lines in that file (if the file does not exists, perhaps we should also grep for errors in the webapp log file regarding "unreference software"), and that number is higher than say 1000, issue warning, and a critical above 10000.
- warning message would tell to do a backup (link to doc: https://docs.rudder.io/reference/6.2/administration/procedures.html#_migration_backups_and_restores) and do a
ldapdelete -h localhost -p 389 -x -D "cn=Manager,cn=rudder-configuration" -w LDAP_PASS -f /var/rudder/metrics/software/unreference-software.txt