Project

General

Profile

Actions

Bug #4769

closed

rudder-agent may be stucked by tokyo cabinet database bloating

Added by Vincent MEMBRÉ almost 10 years ago. Updated almost 10 years ago.

Status:
Released
Priority:
1
Category:
System integration
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

We have issues with tcdb on rudder-agent. the file /var/rudder/cfengine-community/state/cf_lock.tcdb is growing big leading to corruption and slow / block rudder-agent.

We first added a check in check-rudder-agent that looks if there is 8 or more cf-agent process running (#3928).

Then we added a check to look if promises were updated in last 10 minutes ( /var/rudder/cfengine-community/last_successful_inputs_update ) (#4494)

After various fixes on those script (typo, missing files ... #4582, #4604, #4686, #4752), we can see those conditions are not sufficient.

Even with theses fixes, the reporting on nodes will be impacted by that bug. Since the agent is running slower and slower at each execution, reaching the length of the agent interval.
The other thing to see here is that the agent is really more impacted when it's launched by cf-execd, launching manually resultas are far better.

The effects of the big tcdb bases are:
  • The agent are getting slow
  • The agent is using a lot of ressources during a longer period
  • The reporting in Rudder will be broken

I think a solution would be to have a size based check,


Related issues 7 (0 open7 closed)

Related to Rudder - Bug #3928: Sometimes CFEngine get stuck because of locks on TokyoCabinetReleasedJonathan CLARKE2013-09-13Actions
Related to Rudder - Bug #4494: Accumulation of cf-agent processes due to locking on CFEngine tcdb lock fileReleasedJonathan CLARKEActions
Related to Rudder - Bug #4582: Last update detection is broken, causing cron remove cf_lock database and flood with emails every 5 minutesReleasedJonathan CLARKE2014-03-11Actions
Related to Rudder - Bug #4604: Typo in the deletion of lock file if the promises are not updatedReleasedJonathan CLARKE2014-03-12Actions
Related to Rudder - Bug #4686: Typo in /opt/rudder/bin/check-rudder-agent, prevent cleaning of cf-lock and floods with cron mailsReleasedJonathan CLARKE2014-03-28Actions
Related to Rudder - Bug #4752: cf_lock.tcdb is not cleaned by check-rudder-agent script when update file is older than 10 minutesReleasedJonathan CLARKE2014-04-11Actions
Related to Rudder - Bug #4841: Job Scheduler Technique should not use ifelapsed to avoid running several time same jobReleasedJonathan CLARKE2014-05-11Actions
Actions

Also available in: Atom PDF