Bug #4769
closedrudder-agent may be stucked by tokyo cabinet database bloating
Description
We have issues with tcdb on rudder-agent. the file /var/rudder/cfengine-community/state/cf_lock.tcdb is growing big leading to corruption and slow / block rudder-agent.
We first added a check in check-rudder-agent that looks if there is 8 or more cf-agent process running (#3928).
Then we added a check to look if promises were updated in last 10 minutes ( /var/rudder/cfengine-community/last_successful_inputs_update ) (#4494)
After various fixes on those script (typo, missing files ... #4582, #4604, #4686, #4752), we can see those conditions are not sufficient.
Even with theses fixes, the reporting on nodes will be impacted by that bug. Since the agent is running slower and slower at each execution, reaching the length of the agent interval.
The other thing to see here is that the agent is really more impacted when it's launched by cf-execd, launching manually resultas are far better.
- The agent are getting slow
- The agent is using a lot of ressources during a longer period
- The reporting in Rudder will be broken
I think a solution would be to have a size based check,