Sometimes there are too many cf-agent processes running

Added by Dennis Cabooter over 10 years ago. Updated over 10 years ago.

[root@node ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          3824       3798         25          0          0         13
-/+ buffers/cache:       3784         39
Swap:         2047       2047          0

[root@node ~]# ps wwwuax | grep cf-|wc -l

[root@node ~]# kill -9 `ps wwwuax | grep cf- | awk '{ print $2 }'`
-bash: kill: (8484) - No such process

[root@node ~]# ps wwwuax | grep cf-|wc -l

[root@node ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          3824        344       3479          0          2        188
-/+ buffers/cache:        153       3670
Swap:         2047         61       1986

Related to Rudder - Bug #4752: cf_lock.tcdb is not cleaned by check-rudder-agent script when update file is older than 10 minutesReleasedJonathan CLARKE2014-04-11Actions
Is duplicate of Rudder - Bug #3928: Sometimes CFEngine get stuck because of locks on TokyoCabinetReleasedJonathan CLARKE2013-09-13Actions
Updated by Vincent MEMBRÉ over 10 years ago

  • Assignee set to Nicolas CHARLES
  • Target version set to 2.8.3

That is very very much ...

Is it happening only on root server ?

A fix was merged this weekend #3928 that could fix that problem ... (maybe linked to the tokyo cabinet locks ...)

Nicolas, what do you think about that??

As i Suspect tcdb, i target branch 2.8

Updated by Dennis Cabooter over 10 years ago

Sometimes it's on the server as well, sometimes on the nodes. This times it was on a node. And as you can understand ~ 6000 cf-agent processes absorb almost al the memory and makes the node unworkable. The nodes (included the server) report NoAnswer state when this happens.

Updated by Jonathan CLARKE over 10 years ago

  • Status changed from New to Rejected

This looks very much like a duplicate of #3928, and even if it is not, the fix for #3928 will fix this: the cron script that is run every 5 minutes checks if there are more than 8 CFEngine processes running, and kills them if so. It also cleans up the TokyoCabinet cf_lock.tcdb database, which is the cause of this.

Thanks for the report Dennis. I'm closing this ticket as a duplicate of #3928, which will be released in 2.8.3 and 2.9.3 very shortly.


