Bug #7338
closedAll reports are missing (totally orange) for a node due to multiple cf-execd processes
Description
All reports are missing (totally orange) for a node due to multiple cf-execd processes. The logs are there and visible in the web UI.
Workaround: Login on the node. stop rudder-agent. kill -9 cf-execd process which is still running. Start rudder-agent.
Updated by Nicolas CHARLES almost 10 years ago
Dennis, what happen if you run bash -x /opt/rudder/bin/check-rudder-agent ?
What is the exit code ?
Updated by Dennis Cabooter almost 10 years ago
# ps wwwuax|grep cf-exec|grep -v grep root 1679 0.0 0.3 107816 3984 ? Ss 09:34 0:00 /var/rudder/cfengine-community/bin/cf-execd root 2046 0.0 0.3 107816 3984 ? Ss 09:34 0:00 /var/rudder/cfengine-community/bin/cf-execd
# bash -x /opt/rudder/bin/check-rudder-agent
+ . /etc/profile
++ '[' '' ']'
++ '[' -d /etc/profile.d ']'
++ for i in '/etc/profile.d/*.sh'
++ '[' -r /etc/profile.d/bash_completion.sh ']'
++ . /etc/profile.d/bash_completion.sh
+++ '[' -n '4.3.11(1)-release' -a -n '' -a -z '' ']'
++ for i in '/etc/profile.d/*.sh'
++ '[' -r /etc/profile.d/rudder-agent.sh ']'
++ . /etc/profile.d/rudder-agent.sh
+++ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/rudder/cfengine-community/bin:/var/rudder/cfengine-community/bin
+++ export PATH
+++ type manpath
++++ manpath
+++ MANPATH=/usr/local/man:/usr/local/share/man:/usr/share/man:/opt/rudder/share/man:/opt/rudder/share/man
+++ export MANPATH
++ unset i
+ set -e
+ export PATH=/opt/rudder/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/rudder/cfengine-community/bin:/var/rudder/cfengine-community/bin
+ PATH=/opt/rudder/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/rudder/cfengine-community/bin:/var/rudder/cfengine-community/bin
+ BACKUP_DIR=/var/backups/rudder/
++ uname -s
+ OS_FAMILY=Linux
+ CFENGINE_DB_EXT=lmdb
+ '[' zLinux = zAIX ']'
+ CP_A='cp -a'
+ CFE_DIR=/var/rudder/cfengine-community
+ CFE_BIN_DIR=/var/rudder/cfengine-community/bin
+ CFE_DISABLE_FILE=/opt/rudder/etc/disable-agent
+ LAST_UPDATE_FILE=/var/rudder/cfengine-community/last_successful_inputs_update
+ UUID_FILE=/opt/rudder/etc/uuid.hive
++ whoami
+ '[' '!' root = root ']'
+ check_and_fix_rudder_uuid
+ LATEST_BACKUPED_UUID=
+ '[' '!' -e /opt/rudder/etc/uuid.hive ']'
++ wc -l
++ grep -E '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}|root'
++ cat /opt/rudder/etc/uuid.hive
+ CHECK_UUID=1
+ '[' 1 -ne 1 ']'
+ check_and_fix_cfengine_processes
++ ps -h -o utsns --pid 5742
+ ns=4026531838
+ '[' -e /proc/bc/0 ']'
+ '[' -n 4026531838 ']'
+ PS_COMMAND='eval ps --no-header -e -O utsns | grep -E '\''^[[:space:]]*[[:digit:]]*[[:space:]]+4026531838'\'''
++ cat
++ grep -E cf-execd
++ grep -v grep
++ eval ps --no-header -e -O utsns '|' grep -E ''\''^[[:space:]]*[[:digit:]]*[[:space:]]+4026531838'\'''
+++ grep -E '^[[:space:]]*[[:digit:]]*[[:space:]]+4026531838'
+++ ps --no-header -e -O utsns
+ CF_EXECD_RUNNING=' 1679 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd
2046 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd'
++ wc -l
++ grep -v '^$'
++ echo ' 1679 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd
2046 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd'
+ NB_CF_EXECD_RUNNING=2
+ '[' 2 -gt 1 ']'
+ echo_n 'WARNING: Too many instance of CFEngine cf-execd processes running. Killing them...'
+ '[' zLinux = zAIX ']'
+ echo -n WARNING: Too many instance of CFEngine cf-execd processes running. Killing them...
WARNING: Too many instance of CFEngine cf-execd processes running. Killing them...+ xargs kill -9
+ awk 'BEGIN { OFS=" "} {print $2 }'
+ echo ' 1679 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd
2046 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd'
+ true
+ echo ' Done'
Done
++ cat
++ grep -E '/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)'
++ grep -v grep
++ eval ps --no-header -e -O utsns '|' grep -E ''\''^[[:space:]]*[[:digit:]]*[[:space:]]+4026531838'\'''
+++ grep -E '^[[:space:]]*[[:digit:]]*[[:space:]]+4026531838'
+++ ps --no-header -e -O utsns
+ CF_PROCESS_RUNNING=' 1679 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd
2046 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd'
++ wc -l
++ grep -v '^$'
++ echo ' 1679 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd
2046 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd'
+ NB_CF_PROCESS_RUNNING=2
+ '[' '!' -e /opt/rudder/etc/disable-agent -a 2 -eq 0 -a -f /var/rudder/cfengine-community/policy_server.dat ']'
+ '[' -f /var/rudder/cfengine-community/inputs/run_interval ']'
++ cat /var/rudder/cfengine-community/inputs/run_interval
+ RUN_INTERVAL=15
++ expr 15 '*' 2
+ CHECK_INTERVAL=30
+ '[' '!' -e /var/rudder/cfengine-community/last_successful_inputs_update -o -e /opt/rudder/etc/disable-agent ']'
++ find /var/rudder/cfengine-community/last_successful_inputs_update -mmin +30
+ test
+ '[' 2 -gt 8 ']'
+ check_and_fix_cf_lock
+ MAX_CF_LOCK_SIZE=10485760
+ '[' -e /var/rudder/cfengine-community/state/cf_lock.lmdb ']'
+ '[' zLinux = zAIX ']'
++ stat -c%s /var/rudder/cfengine-community/state/cf_lock.lmdb
+ CF_LOCK_SIZE=155648
+ '[' 155648 -ge 10485760 ']'
+ '[' zLinux '!=' zAIX ']'
+ check_and_fix_specific_rudder_agent_file /etc/init.d/rudder-agent init
+ FILE_TO_RESTORE=/etc/init.d/rudder-agent
+ FILE_TYPE=init
+ LATEST_BACKUPED_FILES=
+ '[' '!' -e /etc/init.d/rudder-agent ']'
+ check_and_fix_specific_rudder_agent_file /etc/default/rudder-agent default
+ FILE_TO_RESTORE=/etc/default/rudder-agent
+ FILE_TYPE=default
+ LATEST_BACKUPED_FILES=
+ '[' '!' -e /etc/default/rudder-agent ']'
+ check_and_fix_specific_rudder_agent_file /etc/cron.d/rudder-agent cron
+ FILE_TO_RESTORE=/etc/cron.d/rudder-agent
+ FILE_TYPE=cron
+ LATEST_BACKUPED_FILES=
+ '[' '!' -e /etc/cron.d/rudder-agent ']'
+ base=/var/rudder/cfengine-community/inputs
+ empty /var/rudder/cfengine-community/inputs/common/1.0/update.cf
+ '[' '!' -f /var/rudder/cfengine-community/inputs/common/1.0/update.cf ']'
++ awk '{print $1}'
++ du /var/rudder/cfengine-community/inputs/common/1.0/update.cf
+ '[' 20 = 0 ']'
+ empty /var/rudder/cfengine-community/inputs/failsafe.cf
+ '[' '!' -f /var/rudder/cfengine-community/inputs/failsafe.cf ']'
++ awk '{print $1}'
++ du /var/rudder/cfengine-community/inputs/failsafe.cf
+ '[' 8 = 0 ']'
+ empty /var/rudder/cfengine-community/inputs/promises.cf
+ '[' '!' -f /var/rudder/cfengine-community/inputs/promises.cf ']'
++ awk '{print $1}'
++ du /var/rudder/cfengine-community/inputs/promises.cf
+ '[' 36 = 0 ']'
# ps wwwuax|grep cf-exec|grep -v grep root 1679 0.0 0.3 107816 3984 ? Ss 09:34 0:00 /var/rudder/cfengine-community/bin/cf-execd root 2046 0.0 0.3 107816 3984 ? Ss 09:34 0:00 /var/rudder/cfengine-community/bin/cf-execd
# ps wwwuax|grep cf-exec|grep -v grep root 1679 0.0 0.3 107816 3984 ? Ss 09:34 0:00 /var/rudder/cfengine-community/bin/cf-execd root 2046 0.0 0.3 107816 3984 ? Ss 09:34 0:00 /var/rudder/cfengine-community/bin/cf-execd # /etc/init.d/rudder-agent stop rudder-agent[7161]: [INFO] Using /etc/default/rudder-agent for configuration rudder-agent[7164]: [INFO] Using /var/rudder/cfengine-community for CFEngine workdir rudder-agent[7165]: [INFO] Halting CFEngine Community cf-serverd... rudder-agent[7376]: [OK] CFEngine Community cf-serverd stopped after 2 seconds rudder-agent[7377]: [INFO] Halting CFEngine Community cf-execd... rudder-agent[8140]: [OK] CFEngine Community cf-execd stopped after 6 seconds # ps wwwuax|grep cf-exec|grep -v grep root 1679 0.0 0.3 107816 3984 ? Ss 09:34 0:00 /var/rudder/cfengine-community/bin/cf-execd # kill 1679 # ps wwwuax|grep cf-exec|grep -v grep # /etc/init.d/rudder-agent start rudder-agent[8902]: [INFO] Using /etc/default/rudder-agent for configuration rudder-agent[8905]: [INFO] Using /var/rudder/cfengine-community for CFEngine workdir rudder-agent[8906]: [INFO] Launching CFEngine Community cf-serverd... rudder-agent[9081]: [OK] CFEngine Community cf-serverd started after 1 seconds rudder-agent[9082]: [INFO] Launching CFEngine Community cf-execd... rudder-agent[9258]: [OK] CFEngine Community cf-execd started after 1 seconds # ps wwwuax|grep cf-exec|grep -v grep root 9255 0.0 0.2 40224 2860 ? Ss 10:49 0:00 /var/rudder/cfengine-community/bin/cf-execd
Updated by Dennis Cabooter almost 10 years ago
It seems like this is only happening on Ubuntu machines, not on CentOS/RHEL ones.
Updated by Nicolas CHARLES almost 10 years ago
- Category set to Packaging
- Assignee changed from Nicolas CHARLES to Benoît PECCATTE
- Target version set to 2.11.17
Ok, the problem is
echo -n WARNING: Too many instance of CFEngine cf-execd processes running. Killing them...
WARNING: Too many instance of CFEngine cf-execd processes running. Killing them...+ xargs kill -9
+ awk 'BEGIN { OFS=" "} {print $2 }'
+ echo ' 1679 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd
2046 4026531838 S ? 00:00:00 /var/rudder/cfengine-community/bin/cf-execd'
it does detect that there are 2 cf-execd running, but doesn't get the proper entry for pid
This is probably linked to #7189 and #7243
Could not reproduce it on Centos nor Debian 7, but on Ubuntu the value is invalid
echo ${PS_COMMAND}
eval ps --no-header -e -O utsns | grep -E '^[[:space:]]*[[:digit:]]*[[:space:]]+4026531838'
but I do not have namespace; i think we should use ps -ef
Updated by Nicolas CHARLES almost 10 years ago
- Related to Bug #7189: issues with process management on physical hosting LXC containers added
Updated by Benoît PECCATTE almost 10 years ago
Ubuntu supporte namespaces and in the previous output the command
ps -h -o utsns --pid $$
gives 4026531838 (the value in your grep) which only possible if you have namespace support.
But I see a possible reason, ps -O utsns change the output field order so the kill doesn't work.
Updated by Benoît PECCATTE almost 10 years ago
- Status changed from New to In progress
Updated by Benoît PECCATTE almost 10 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Benoît PECCATTE to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder-packages/pull/783
Updated by Benoît PECCATTE almost 10 years ago
- Status changed from Pending technical review to Pending release
- % Done changed from 0 to 100
Applied in changeset rudder-packages|05320e8ca678754450ac96c5f16ee47daaf668a8.
Updated by Jonathan CLARKE almost 10 years ago
Applied in changeset rudder-packages|6ab33d78a910aae58544e6a3623c7ba4d3b2ebc6.
Updated by Vincent MEMBRÉ almost 10 years ago
- Status changed from Pending release to Released
Updated by Florian Heigl over 9 years ago
I found one needs to also modify
@[root@rudder 1.0]# git show
commit 811a3ca2e8f1342b58fb19151e720c1ffda68da8
Author: root user (CLI) <root@localhost>
Date: Wed Jan 13 00:34:57 2016 +0100
adjust for lxc env
diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st
index b59974c..5b6db6e 100644
--- a/techniques/system/common/1.0/promises.st
+++ b/techniques/system/common/1.0/promises.st@ -341,12 +341,12 @ bundle agent check_cf_processes
# process_kill is the same for SIGKILL.
!windows::
# On windows, cf-execd is a service, and there can be only one instance of it running (by design)
- "process_term[execd]" string => "2";
- "process_kill[execd]" string => "5";
+ "process_term[execd]" string => "6";
+ "process_kill[execd]" string => "8";
any::
- "process_term[agent]" string => "5";
- "process_kill[agent]" string => "8";
+ "process_term[agent]" string => "8";
+ "process_kill[agent]" string => "16";
"binaries" slist => getindices("process_term");@
This is not sufficient since it'll also raise the limits on all containers, i just don't know a more appropriate fix.