Project

General

Profile

Actions

Bug #4752

closed

cf_lock.tcdb is not cleaned by check-rudder-agent script when update file is older than 10 minutes

Added by Vincent MEMBRÉ over 10 years ago. Updated over 10 years ago.

Status:
Released
Priority:
1 (highest)
Assignee:
Jonathan CLARKE
Category:
System integration
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

Despite all our fixes to prevent agent being stuck by tokyo cabinet databases, it occurs that in some cases it can't preven agents from being stuck...

Our solution is to check every 5 minutes if agents are piling up and if there is more than 8 agents, clean the databases.

However, Dennis Cabooter reported on irc, that even with latests fixes, it's agent were blocked again, and that our fixes leads to lots of other problems (particularly of cron sending tons of mails ...).

It occurs that Dennis put a tmpfs in place of 128Mb that gets full due to cf_lock.tcdb, making agents fails directly and not piling up

To fix the problem Dennis put in place a cron designed by Olivier Mauras to remove the tcdb files if their size is over 10Mb:

if [ `ls -l /var/rudder/cfengine-community/state/cf_lock.tcdb | cut -f 5 -d " "` -gt "10485760" ]; then rm -rf /var/rudder/cfengine-community/state/* && /opt/rudder/bin/cf-agent -KI > /dev/null 2>&1; fi

And with it, he never got any problem with the databases.

A solution could be to replace our check by this one in our check-rudder-agent script.

But 10Mb seems quite arbitrary to me, and we need some feedback on the size of the files:

Size of that file could take more than 100Mb, I don't from which size it starts to stuck the agent.

What I'd like to know is the growing rate of those files, to determine which value can be a good solution.

So I ask you one thing dear community, can you please post here the results of the following command from several nodes?

ls -lh /var/rudder/cfengine-community/state 

What i'd like to know most if the state of those files (particularly cf_lock.tcdb) a few hours after it was cleaned.


Related issues 7 (0 open7 closed)

Related to Rudder - Bug #4686: Typo in /opt/rudder/bin/check-rudder-agent, prevent cleaning of cf-lock and floods with cron mailsReleasedJonathan CLARKE2014-03-28Actions
Related to Rudder - Bug #4604: Typo in the deletion of lock file if the promises are not updatedReleasedJonathan CLARKE2014-03-12Actions
Related to Rudder - Bug #4582: Last update detection is broken, causing cron remove cf_lock database and flood with emails every 5 minutesReleasedJonathan CLARKE2014-03-11Actions
Related to Rudder - Bug #4494: Accumulation of cf-agent processes due to locking on CFEngine tcdb lock fileReleasedJonathan CLARKEActions
Related to Rudder - Bug #4408: Sometimes there are too many cf-agent processes runningRejectedNicolas CHARLES2014-01-27Actions
Related to Rudder - Bug #3928: Sometimes CFEngine get stuck because of locks on TokyoCabinetReleasedJonathan CLARKE2013-09-13Actions
Related to Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatingReleasedJonathan CLARKE2014-04-23Actions
Actions

Also available in: Atom PDF