Project

General

Profile

Actions

Bug #4408

closed

Sometimes there are too many cf-agent processes running

Added by Dennis Cabooter almost 11 years ago. Updated almost 11 years ago.

Status:
Rejected
Priority:
N/A
Category:
-
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

[root@node ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          3824       3798         25          0          0         13
-/+ buffers/cache:       3784         39
Swap:         2047       2047          0

[root@node ~]# ps wwwuax | grep cf-|wc -l
5988

[root@node ~]# kill -9 `ps wwwuax | grep cf- | awk '{ print $2 }'`
-bash: kill: (8484) - No such process

[root@node ~]# ps wwwuax | grep cf-|wc -l
5

[root@node ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          3824        344       3479          0          2        188
-/+ buffers/cache:        153       3670
Swap:         2047         61       1986

Related issues 2 (0 open2 closed)

Related to Rudder - Bug #4752: cf_lock.tcdb is not cleaned by check-rudder-agent script when update file is older than 10 minutesReleasedJonathan CLARKE2014-04-11Actions
Is duplicate of Rudder - Bug #3928: Sometimes CFEngine get stuck because of locks on TokyoCabinetReleasedJonathan CLARKE2013-09-13Actions
Actions #1

Updated by Vincent MEMBRÉ almost 11 years ago

  • Assignee set to Nicolas CHARLES
  • Target version set to 2.8.3

That is very very much ...

Is it happening only on root server ?

A fix was merged this weekend #3928 that could fix that problem ... (maybe linked to the tokyo cabinet locks ...)

Nicolas, what do you think about that??

As i Suspect tcdb, i target branch 2.8

Actions #2

Updated by Dennis Cabooter almost 11 years ago

Sometimes it's on the server as well, sometimes on the nodes. This times it was on a node. And as you can understand ~ 6000 cf-agent processes absorb almost al the memory and makes the node unworkable. The nodes (included the server) report NoAnswer state when this happens.

Actions #3

Updated by Jonathan CLARKE almost 11 years ago

  • Status changed from New to Rejected

This looks very much like a duplicate of #3928, and even if it is not, the fix for #3928 will fix this: the cron script that is run every 5 minutes checks if there are more than 8 CFEngine processes running, and kills them if so. It also cleans up the TokyoCabinet cf_lock.tcdb database, which is the cause of this.

Thanks for the report Dennis. I'm closing this ticket as a duplicate of #3928, which will be released in 2.8.3 and 2.9.3 very shortly.

Actions

Also available in: Atom PDF