Bug #3654
closedRudder cron file contains error until the use of CFEngine and will display error into /var/mail for root
Added by Dennis Cabooter over 11 years ago. Updated over 9 years ago.
Description
The following cron sometimes causes errors on some nodes:
Cron <root@node> if [ ! -e ${g.rudder_base}/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "${sys.workdir}/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then ${sys.workdir}/bin/cf-agent -f failsafe.cf \&\& ${sys.workdir}/bin/cf-agent; fi
The error is as follows:
/bin/sh: ${g.rudder_base}/etc/disable-agent: bad substitution
Updated by Vincent MEMBRÉ over 11 years ago
- Category set to Techniques
- Assignee set to Vincent MEMBRÉ
- Priority changed from N/A to 2
It's seems that a cfengine var has not been expanded, i'll look into this!
Updated by Vincent MEMBRÉ over 11 years ago
- Status changed from New to Discussion
- Assignee changed from Vincent MEMBRÉ to Dennis Cabooter
This happens on all nodes ? or just a subset of them ?
Is there any links between them ? Same os ?
Is it happening at all run, or sometimes it's correctly expanded and sometimes not
The template should have been expanded correctly.
Do all nodes are using the same agent version than the server ?
Updated by Dennis Cabooter over 11 years ago
It happens sometimes on a few nodes. This time it doesn't stop anymore. They do have the same OS; RHEL 5. The clients use the same agent version as the server.
Question.. There's a /etc/cron.d/rudder-agent file, but also the same entry exists in /etc/crontab. Is this correct? And if so, what is the reason for this?
Updated by Dennis Cabooter over 11 years ago
# tail -1 /etc/crontab 0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e /opt/rudder/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf \&\& /var/rudder/cfengine-community/bin/cf-agent; fi # cat /etc/cron.d/rudder-agent # Cron file for Rudder # # Will manually run cf-agent in case cf-execd is no longer running. cf-agent will fire up a new cf-execd. # # To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent. # Don't forget to remove that file when you're done! 0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e ${g.rudder_base}/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "${sys.workdir}/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then ${sys.workdir}/bin/cf-agent -f failsafe.cf \&\& ${sys.workdir}/bin/cf-agent; fi
Updated by Nicolas CHARLES over 11 years ago
- Subject changed from chmod: cannot access `/var/rudder/cfengine-community/ppkeys/*' to ${g.rudder_base} in the crontab files
Updated by Dennis Cabooter over 11 years ago
Here the contents of /var/rudder/cfengine-community/inputs/common/1.0/process_matching.cf:
##################################################################################### # Copyright 2011 Normation SAS ##################################################################################### # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, Version 3. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. # ##################################################################################### bundle agent process_matching { vars: # This deliberately excludes cf-execd which is handled separately below "cf_components" slist => { "cf-key", # "cf-monitord", Disabled "cf-promises", "cf-report", "cf-runagent", "cf-serverd" }; windows:: "stop_signal" string => "kill"; !windows:: "stop_signal" string => "term"; classes: "restart_cf" expression => "Hr05.Min00_05.!disable_agent"; processes: windows:: # Always stop cf-monitord "${g.escaped_workdir}\/bin\/cf-monitord" signals => { "${stop_signal}" }; linux:: # Always stop cf-monitord "${sys.workdir}/bin/cf-monitord" signals => { "${stop_signal}" }; restart_cf.!policy_server:: "${cf_components}" signals => { "${stop_signal}" }; # Policy servers have both Nova and Community, don't blindly kill the wrong processes restart_cf.policy_server:: "${sys.workdir}/bin/${cf_components}" signals => { "${stop_signal}" }; restart_cf.!windows:: "${sys.workdir}/bin/cf-execd" signals => { "${stop_signal}" }; commands: restart_cf.!windows:: "${sys.cf_serverd}"; "${sys.cf_execd}"; files: linux:: # This is to cleanup /etc/crontab from pre-2.5 usage # It can be removed before 2.6 "${g.crontab}" edit_line => cron_cleanup; community_edition:: "/etc/cron.d/rudder-agent" create => "true", perms => mog("644", "root", "root"), edit_defaults => empty, edit_line => expand_template("${sys.workdir}/inputs/common/cron/rudder_agent_community_cron"); reports: restart_cf:: "Reloaded configuration of all Cfengine components"; } # This is to cleanup /etc/crontab from pre-2.5 usage # It can be removed before 2.6 bundle edit_line cron_cleanup { # Remove old lines to replace them with new version delete_lines: "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ \`ps -efww \| grep cf-execd \| grep \"/var/cfengine/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/cfengine/bin/cf-execd; fi"; "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ \`ps -efww \| grep cf-execd \| grep \"/var/rudder/cfengine-community/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/rudder/cfengine-community/bin/cf-execd; fi"; "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep cf-execd \| grep \"/var/rudder/cfengine-community/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/rudder/cfengine-community/bin/cf-execd; fi"; "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep cf-execd \| grep \"/var/cfengine/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/cfengine/bin/cf-execd; fi"; "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep cf-execd \| grep \"/var/cfengine/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/cfengine/bin/cf-agent -f failsafe.cf && /var/cfengine/bin/cf-agent; fi"; "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep cf-execd \| grep \"/var/rudder/cfengine-community/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf && /var/rudder/cfengine-community/bin/cf-agent; fi"; "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep -E \"\(cf-execd\|cf-agent\)\" \| grep -E \"/var/cfengine/bin/\(cf-execd\|cf-agent\)\" \| grep -v grep \| wc -l\` -eq 0 ]; then /var/cfengine/bin/cf-agent -f failsafe.cf && /var/cfengine/bin/cf-agent; fi"; "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep -E \"\(cf-execd\|cf-agent\)\" \| grep -E \"/var/rudder/cfengine-community/bin/\(cf-execd\|cf-agent\)\" \| grep -v grep \| wc -l\` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf && /var/rudder/cfengine-community/bin/cf-agent; fi"; }
Updated by Nicolas CHARLES over 11 years ago
Ok, so it seems to be related with a packaging or upgrade problem :
/var/rudder/cfengine-community/bin/cf-agent: error while loading shared libraries: libtokyocabinet.so.9: cannot open shared object file: No such file or directory
Reinstalling the package rudder-agent solved the problem
Could it be that some promises go corrupted, or initial promises kicked in during upgrade, and then the agent got broken, not repairing the content of file ?
Updated by Dennis Cabooter over 11 years ago
This problem occurs on all RHEL nodes without exception. On these nodes reinstalling rudder-agent solves the problem.
Updated by Vincent MEMBRÉ over 11 years ago
- Subject changed from ${g.rudder_base} in the crontab files to When migrating a RHEL5 node to Rudder 2.6, CFEngine 3.4.4 is not correctly installed (libtokyo is missing)
- Status changed from Discussion to 8
Libtokyocabinet is not installed on upgrade of rudder-agent on RHEL5.
This is due to the rpm package behavior on upgrade.
On rpm upgrade the build part is not run. (see http://www.ibm.com/developerworks/library/l-rpm2/)
This part is doing all preparation for cfengine 3.4 and is not called ( and libtokyo too) this is why RHEL5 nodes migrated from Rudder 2.5 cannot use CFEngine 3.4
Reinstalling the agent works fine to fix this bug, but we have to fix the rpm package to handle that case.
Updated by Vincent MEMBRÉ over 11 years ago
I just did a migration on a centos5 (5.7), and it worked like a charm.
Besides, The build section is only useful to build the package, so it should not be the reason why it happens.
Still need to reproduce that bug ...
Updated by Nicolas PERRON over 11 years ago
- Target version changed from 2.6.3 to 2.6.4
Updated by Andrew Cranson over 11 years ago
This is also happening on new installations of rudder-agent-2.7.0.release-1.EL.5 on CentOS release 5.5 (Final) and rudder-agent-2.7.0.release-1.EL.6.x86_64 on CentOS release 6.3 (Final).
CT-9123-bash-3.2# cat /etc/cron.d/rudder-agent # Cron file for Rudder # # Will manually run cf-agent in case cf-execd is no longer running. cf-agent will fire up a new cf-execd. # # To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent. # Don't forget to remove that file when you're done! 0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e /opt/rudder/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf && /var/rudder/cfengine-community/bin/cf-agent; fi
The email from cron has the subject:
Cron <root@cdp-automation> if [ ! -e ${g.rudder_base}/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "${sys.workdir}/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then ${sys.workdir}/bin/cf-agent -f failsafe.cf \&\& ${sys.workdir}/bin/cf-agent; fi
And the following content:
/bin/sh: ${g.rudder_base}/etc/disable-agent: bad substitution
This is when using rudder 1.7.0-release1 as policy server (COS6) and agent. These are new nodes registered in rudder, not migrated nodes.
Updated by Vincent MEMBRÉ over 11 years ago
- Assignee changed from Dennis Cabooter to Nicolas PERRON
Thank you Andrew for reporting your installation error, it seems we really got a problem here and need to go deeper in it.
Nicolas, can you into this please ?
Updated by Andrew Cranson over 11 years ago
Thanks. I'll run some more tests today and try to find a reliably reproducible test case for you (it was late last night when I was working on this).
Updated by Nicolas PERRON over 11 years ago
- Status changed from 8 to Discussion
- Assignee changed from Nicolas PERRON to Dennis Cabooter
Andrew Cranson wrote:
Thanks. I'll run some more tests today and try to find a reliably reproducible test case for you (it was late last night when I was working on this).
I've try to reproduce without success. Did you installed another version of Rudder before ?
Do you have any other entries of rudder in the cron ?
grep -r "rudder" /etc/cron*
Updated by Nicolas PERRON over 11 years ago
- Assignee changed from Dennis Cabooter to Nicolas PERRON
Updated by Andrew Cranson over 11 years ago
This is the first time Rudder was installed, and there are no other entries:
CT-9123-bash-3.2# grep -r "rudder" /etc/cron* /etc/cron.d/rudder-agent:# To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent. /etc/cron.d/rudder-agent:0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e /opt/rudder/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf && /var/rudder/cfengine-community/bin/cf-agent; fi CT-9123-bash-3.2# grep rudder /var/log/yum.log Aug 19 22:23:48 Installed: 1398866025:rudder-agent-2.7.0.release-1.EL.5.i386 CT-9123-bash-3.2#
I'm trying some test cases now. I'll update you today.
Updated by Andrew Cranson over 11 years ago
(p.s. I checked yum.log's since the server was created - rudder was never installed on this server)
Updated by Nicolas PERRON over 11 years ago
Andrew Cranson wrote:
(p.s. I checked yum.log's since the server was created - rudder was never installed on this server)
Ok, and the problem is still present ?
Updated by Andrew Cranson over 11 years ago
Yes. I have a reproducible test case.
This is what I did exactly, step-by-step:
- Install a new Linux VPS with CentOS 6 x86_64 (on Parallels Cloud Server)
- Install rudder-agent repo with the following script:
#!/bin/sh function install_rudder_repo() { [ -f /etc/yum.repos.d/rudder.repo ] && return local dist_name='' local dist_version='' if [ -f /etc/redhat-release ]; then dist_name="CentOS" dist_version=`cat /etc/redhat-release | sed -e 's/^.\+\([0-9]\+\.[0-9]\+\).\+$/\1/g'` fi [ "$dist_name" = "" ] && return if [ "$dist_name" = "CentOS" ]; then echo $dist_version | grep "^5." > /dev/null [ $? -eq 0 ] && echo "[Rudder_2.7]" > /etc/yum.repos.d/rudder.repo && echo "name=Rudder 2.7 Repository RHEL5" >> /etc/yum.repos.d/rudder.repo && echo "baseurl=http://www.rudder-project.org/rpm-2.7/RHEL_5/" >> /etc/yum.repos.d/rudder.repo && echo "gpgcheck=0" >> /etc/yum.repos.d/rudder.repo && echo "" >> /etc/yum.repos.d/rudder.repo echo $dist_version | grep "^6." > /dev/null [ $? -eq 0 ] && echo "[Rudder_2.7]" > /etc/yum.repos.d/rudder.repo && echo "name=Rudder 2.7 Repository RHEL6" >> /etc/yum.repos.d/rudder.repo && echo "baseurl=http://www.rudder-project.org/rpm-2.7/RHEL_6/" >> /etc/yum.repos.d/rudder.repo && echo "gpgcheck=0" >> /etc/yum.repos.d/rudder.repo && echo "" >> /etc/yum.repos.d/rudder.repo fi } install_rudder_repo
- yum clean all
- yum install rudder-agent -y
- yum update rudder-agent -y
#echo "our policy server IP" > /var/rudder/cfengine-community/policy_server.dat
#iptables -I INPUT -s our-policy-server-ip/32 -j ACCEPT
#/etc/init.d/rudder-agent restart
However, as you can see cron is incorrect:
CT-10150-bash-4.1# grep -r "rudder" /etc/cron* /etc/cron.d/rudder-agent:# To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent. /etc/cron.d/rudder-agent:0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e ${g.rudder_base}/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "${sys.workdir}/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then ${sys.workdir}/bin/cf-agent -f failsafe.cf \&\& ${sys.workdir}/bin/cf-agent; fi
After a few minutes, the cron file corrects itself (without me doing anything):
CT-10150-bash-4.1# grep -r "rudder" /etc/cron* /etc/cron.d/rudder-agent:# To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent. /etc/cron.d/rudder-agent:0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e /opt/rudder/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf \&\& /var/rudder/cfengine-community/bin/cf-agent; fi
Would you like me to privately send you root login details to see this for yourself? (This is a test Linux VPS - CentOS 6 x86_64) If so, please confirm how to send this.
I've successfully reproduced this 5 times on the same system so I think it will be easy to reproduce if I give you access.
Thanks
Updated by Nicolas PERRON over 11 years ago
Andrew Cranson wrote:
Yes. I have a reproducible test case.
This is what I did exactly, step-by-step:
- Install a new Linux VPS with CentOS 6 x86_64 (on Parallels Cloud Server)
- Install rudder-agent repo with the following script:
[...]- yum clean all
- yum install rudder-agent -y
- yum update rudder-agent -y
#echo "our policy server IP" > /var/rudder/cfengine-community/policy_server.dat
#iptables -I INPUT -s our-policy-server-ip/32 -j ACCEPT
#/etc/init.d/rudder-agent restartHowever, as you can see cron is incorrect:
[...]
After a few minutes, the cron file corrects itself (without me doing anything):
[...]
Would you like me to privately send you root login details to see this for yourself? (This is a test Linux VPS - CentOS 6 x86_64) If so, please confirm how to send this.
I've successfully reproduced this 5 times on the same system so I think it will be easy to reproduce if I give you access.
Thanks
Ok, I see what's the problem. This is a duplicate of #3600
Updated by Nicolas PERRON over 11 years ago
- Status changed from Discussion to In progress
I will consider this issue as the principal one between it and #3600.
Ok, so the problem is there until communication with a server. The problem is on the initial promises.
I'm on it.
Updated by Nicolas PERRON over 11 years ago
Got it.
This problem came from #3204 which added /etc/cron.d/rudder-agent as a packaged file. Except that the file is a template with unexpanded variables and it will not work until CFEngine is not launched.
The only solution I see is to replace the variables with sed into the file.
Updated by Nicolas PERRON over 11 years ago
- Subject changed from When migrating a RHEL5 node to Rudder 2.6, CFEngine 3.4.4 is not correctly installed (libtokyo is missing) to Rudder cron file contains error until the use of CFEngine and will display error into /var/mail for root
- Status changed from In progress to Pending technical review
- % Done changed from 0 to 100
- Pull Request set to https://github.com/Normation/rudder-packages/pull/99
Pull Request URL added: https://github.com/Normation/rudder-packages/pull/99
Matthieu, could you review it please ?
Updated by Nicolas PERRON over 11 years ago
- Assignee changed from Nicolas PERRON to Matthieu CERDA
Updated by Matthieu CERDA over 11 years ago
- Assignee changed from Matthieu CERDA to Nicolas PERRON
I approve this commit, but I'm not sure that this should be merged now.
Updated by Andrew Cranson over 11 years ago
We can implement a workaround (using sed) in our installation script until this is approved, so no problem delaying the merge temporarily from our side. Thanks to you both for your help.
Updated by Jonathan CLARKE about 11 years ago
- Status changed from Pending technical review to Discussion
Nico, I added some comments in the PR, to try and simplify this. Can you address them please?
Updated by Nicolas PERRON about 11 years ago
- Assignee changed from Nicolas PERRON to Jonathan CLARKE
Jonathan CLARKE wrote:
Nico, I added some comments in the PR, to try and simplify this. Can you address them please?
PR Updated and rebased.
Updated by Nicolas PERRON about 11 years ago
- Status changed from Discussion to Pending technical review
Updated by Jonathan CLARKE about 11 years ago
- Project changed from 24 to 34
- Category deleted (
Techniques)
Updated by Jonathan CLARKE about 11 years ago
- Status changed from Pending technical review to Pending release
Updated by Nicolas PERRON about 11 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 2.6.4, which was released today.
Check out:
- The release announcement: http://www.rudder-project.org/pipermail/rudder-announce/2013-September/000045.html
- The full ChangeLog: http://www.rudder-project.org/foswiki/bin/view/System/Documentation:ChangeLog26
- Download information: http://www.rudder-project.org/foswiki/Download/
Updated by Benoît PECCATTE over 9 years ago
- Project changed from 34 to Rudder
- Category set to Packaging