Project

General

Profile

Actions

Bug #3654

closed

Rudder cron file contains error until the use of CFEngine and will display error into /var/mail for root

Added by Dennis Cabooter almost 11 years ago. Updated about 9 years ago.

Status:
Released
Priority:
2
Category:
Packaging
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

The following cron sometimes causes errors on some nodes:

Cron <root@node> if [ ! -e ${g.rudder_base}/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "${sys.workdir}/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then ${sys.workdir}/bin/cf-agent -f failsafe.cf \&\& ${sys.workdir}/bin/cf-agent; fi

The error is as follows:

/bin/sh: ${g.rudder_base}/etc/disable-agent: bad substitution

Related issues 1 (0 open1 closed)

Has duplicate Rudder - Bug #3600: On a new installation of Rudder, the user root got mail containing error about execution of cronRejected2013-05-17Actions
Actions #1

Updated by Vincent MEMBRÉ almost 11 years ago

  • Category set to Techniques
  • Assignee set to Vincent MEMBRÉ
  • Priority changed from N/A to 2

It's seems that a cfengine var has not been expanded, i'll look into this!

Actions #2

Updated by Vincent MEMBRÉ almost 11 years ago

  • Status changed from New to Discussion
  • Assignee changed from Vincent MEMBRÉ to Dennis Cabooter

This happens on all nodes ? or just a subset of them ?

Is there any links between them ? Same os ?

Is it happening at all run, or sometimes it's correctly expanded and sometimes not

The template should have been expanded correctly.

Do all nodes are using the same agent version than the server ?

Actions #3

Updated by Dennis Cabooter almost 11 years ago

It happens sometimes on a few nodes. This time it doesn't stop anymore. They do have the same OS; RHEL 5. The clients use the same agent version as the server.

Question.. There's a /etc/cron.d/rudder-agent file, but also the same entry exists in /etc/crontab. Is this correct? And if so, what is the reason for this?

Actions #4

Updated by Dennis Cabooter almost 11 years ago

# tail -1 /etc/crontab
0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e /opt/rudder/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf \&\& /var/rudder/cfengine-community/bin/cf-agent; fi

# cat /etc/cron.d/rudder-agent
# Cron file for Rudder
#
# Will manually run cf-agent in case cf-execd is no longer running. cf-agent will fire up a new cf-execd.
#
# To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent.
# Don't forget to remove that file when you're done!

0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e ${g.rudder_base}/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "${sys.workdir}/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then ${sys.workdir}/bin/cf-agent -f failsafe.cf \&\& ${sys.workdir}/bin/cf-agent; fi
Actions #5

Updated by Nicolas CHARLES almost 11 years ago

  • Subject changed from chmod: cannot access `/var/rudder/cfengine-community/ppkeys/*' to ${g.rudder_base} in the crontab files
Actions #6

Updated by Dennis Cabooter almost 11 years ago

Here the contents of /var/rudder/cfengine-community/inputs/common/1.0/process_matching.cf:

#####################################################################################
# Copyright 2011 Normation SAS
#####################################################################################
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, Version 3.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
#####################################################################################

bundle agent process_matching
{
  vars:

      # This deliberately excludes cf-execd which is handled separately below
      "cf_components" slist => {
        "cf-key",
        # "cf-monitord", Disabled
        "cf-promises",
        "cf-report",
        "cf-runagent",
        "cf-serverd" 
      };

    windows::
      "stop_signal" string => "kill";

    !windows::
      "stop_signal" string => "term";

  classes:

      "restart_cf" expression => "Hr05.Min00_05.!disable_agent";

  processes:

    windows::
      # Always stop cf-monitord
      "${g.escaped_workdir}\/bin\/cf-monitord" signals => { "${stop_signal}" };

    linux::
      # Always stop cf-monitord
      "${sys.workdir}/bin/cf-monitord"         signals => { "${stop_signal}" };

    restart_cf.!policy_server::
      "${cf_components}" signals => { "${stop_signal}" };

    # Policy servers have both Nova and Community, don't blindly kill the wrong processes
    restart_cf.policy_server::
      "${sys.workdir}/bin/${cf_components}" signals => { "${stop_signal}" };

    restart_cf.!windows::
      "${sys.workdir}/bin/cf-execd"         signals => { "${stop_signal}" };

  commands:

    restart_cf.!windows::

      "${sys.cf_serverd}";
      "${sys.cf_execd}";

  files:

    linux::

      # This is to cleanup /etc/crontab from pre-2.5 usage
      # It can be removed before 2.6
      "${g.crontab}" 
        edit_line => cron_cleanup;

    community_edition::

      "/etc/cron.d/rudder-agent" 
        create        => "true",
        perms         => mog("644", "root", "root"),
        edit_defaults => empty,
        edit_line     => expand_template("${sys.workdir}/inputs/common/cron/rudder_agent_community_cron");
  reports:

    restart_cf::
      "Reloaded configuration of all Cfengine components";

}

# This is to cleanup /etc/crontab from pre-2.5 usage
# It can be removed before 2.6
bundle edit_line cron_cleanup
{

# Remove old lines to replace them with new version
  delete_lines:

      "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ \`ps -efww \| grep cf-execd \| grep \"/var/cfengine/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/cfengine/bin/cf-execd; fi";
      "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ \`ps -efww \| grep cf-execd \| grep \"/var/rudder/cfengine-community/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/rudder/cfengine-community/bin/cf-execd; fi";
      "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep cf-execd \| grep \"/var/rudder/cfengine-community/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/rudder/cfengine-community/bin/cf-execd; fi";
      "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep cf-execd \| grep \"/var/cfengine/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/cfengine/bin/cf-execd; fi";
      "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep cf-execd \| grep \"/var/cfengine/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/cfengine/bin/cf-agent -f failsafe.cf && /var/cfengine/bin/cf-agent; fi";
      "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep cf-execd \| grep \"/var/rudder/cfengine-community/bin/cf-execd\" \| grep -v grep \| wc -l\` -eq 0 \]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf && /var/rudder/cfengine-community/bin/cf-agent; fi";
      "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep -E \"\(cf-execd\|cf-agent\)\" \| grep -E \"/var/cfengine/bin/\(cf-execd\|cf-agent\)\" \| grep -v grep \| wc -l\` -eq 0 ]; then /var/cfengine/bin/cf-agent -f failsafe.cf && /var/cfengine/bin/cf-agent; fi";
      "0,5,10,15,20,25,30,35,40,45,50,55 \* \* \* \* root if \[ ! -e /opt/rudder/etc/disable-agent -a \`ps -efww \| grep -E \"\(cf-execd\|cf-agent\)\" \| grep -E \"/var/rudder/cfengine-community/bin/\(cf-execd\|cf-agent\)\" \| grep -v grep \| wc -l\` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf && /var/rudder/cfengine-community/bin/cf-agent; fi";

}
Actions #7

Updated by Nicolas CHARLES almost 11 years ago

Ok, so it seems to be related with a packaging or upgrade problem :

/var/rudder/cfengine-community/bin/cf-agent: error while loading shared libraries: libtokyocabinet.so.9: cannot open shared object file: No such file or directory

Reinstalling the package rudder-agent solved the problem

Could it be that some promises go corrupted, or initial promises kicked in during upgrade, and then the agent got broken, not repairing the content of file ?

Actions #8

Updated by Dennis Cabooter almost 11 years ago

This problem occurs on all RHEL nodes without exception. On these nodes reinstalling rudder-agent solves the problem.

Actions #9

Updated by Vincent MEMBRÉ almost 11 years ago

  • Subject changed from ${g.rudder_base} in the crontab files to When migrating a RHEL5 node to Rudder 2.6, CFEngine 3.4.4 is not correctly installed (libtokyo is missing)
  • Status changed from Discussion to 8

Libtokyocabinet is not installed on upgrade of rudder-agent on RHEL5.

This is due to the rpm package behavior on upgrade.

On rpm upgrade the build part is not run. (see http://www.ibm.com/developerworks/library/l-rpm2/)

This part is doing all preparation for cfengine 3.4 and is not called ( and libtokyo too) this is why RHEL5 nodes migrated from Rudder 2.5 cannot use CFEngine 3.4

Reinstalling the agent works fine to fix this bug, but we have to fix the rpm package to handle that case.

Actions #10

Updated by Vincent MEMBRÉ almost 11 years ago

I just did a migration on a centos5 (5.7), and it worked like a charm.

Besides, The build section is only useful to build the package, so it should not be the reason why it happens.

Still need to reproduce that bug ...

Actions #11

Updated by Nicolas PERRON over 10 years ago

  • Target version changed from 2.6.3 to 2.6.4
Actions #12

Updated by Andrew Cranson over 10 years ago

This is also happening on new installations of rudder-agent-2.7.0.release-1.EL.5 on CentOS release 5.5 (Final) and rudder-agent-2.7.0.release-1.EL.6.x86_64 on CentOS release 6.3 (Final).

CT-9123-bash-3.2# cat /etc/cron.d/rudder-agent
# Cron file for Rudder
#
# Will manually run cf-agent in case cf-execd is no longer running. cf-agent will fire up a new cf-execd.
#
# To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent.
# Don't forget to remove that file when you're done!

0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e /opt/rudder/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf && /var/rudder/cfengine-community/bin/cf-agent; fi

The email from cron has the subject:

Cron <root@cdp-automation> if [ ! -e ${g.rudder_base}/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "${sys.workdir}/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then ${sys.workdir}/bin/cf-agent -f failsafe.cf \&\& ${sys.workdir}/bin/cf-agent; fi

And the following content:

/bin/sh: ${g.rudder_base}/etc/disable-agent: bad substitution

This is when using rudder 1.7.0-release1 as policy server (COS6) and agent. These are new nodes registered in rudder, not migrated nodes.

Actions #13

Updated by Vincent MEMBRÉ over 10 years ago

  • Assignee changed from Dennis Cabooter to Nicolas PERRON

Thank you Andrew for reporting your installation error, it seems we really got a problem here and need to go deeper in it.

Nicolas, can you into this please ?

Actions #14

Updated by Andrew Cranson over 10 years ago

Thanks. I'll run some more tests today and try to find a reliably reproducible test case for you (it was late last night when I was working on this).

Actions #15

Updated by Nicolas PERRON over 10 years ago

  • Status changed from 8 to Discussion
  • Assignee changed from Nicolas PERRON to Dennis Cabooter

Andrew Cranson wrote:

Thanks. I'll run some more tests today and try to find a reliably reproducible test case for you (it was late last night when I was working on this).

I've try to reproduce without success. Did you installed another version of Rudder before ?
Do you have any other entries of rudder in the cron ?

grep -r  "rudder" /etc/cron*

Actions #16

Updated by Nicolas PERRON over 10 years ago

  • Assignee changed from Dennis Cabooter to Nicolas PERRON
Actions #17

Updated by Andrew Cranson over 10 years ago

This is the first time Rudder was installed, and there are no other entries:

CT-9123-bash-3.2# grep -r  "rudder" /etc/cron*
/etc/cron.d/rudder-agent:# To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent.
/etc/cron.d/rudder-agent:0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e /opt/rudder/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf && /var/rudder/cfengine-community/bin/cf-agent; fi
CT-9123-bash-3.2# grep rudder /var/log/yum.log
Aug 19 22:23:48 Installed: 1398866025:rudder-agent-2.7.0.release-1.EL.5.i386
CT-9123-bash-3.2#

I'm trying some test cases now. I'll update you today.

Actions #18

Updated by Andrew Cranson over 10 years ago

(p.s. I checked yum.log's since the server was created - rudder was never installed on this server)

Actions #19

Updated by Nicolas PERRON over 10 years ago

Andrew Cranson wrote:

(p.s. I checked yum.log's since the server was created - rudder was never installed on this server)

Ok, and the problem is still present ?

Actions #20

Updated by Andrew Cranson over 10 years ago

Yes. I have a reproducible test case.

This is what I did exactly, step-by-step:

  1. Install a new Linux VPS with CentOS 6 x86_64 (on Parallels Cloud Server)
  2. Install rudder-agent repo with the following script:
    #!/bin/sh
    
    function install_rudder_repo() {
        [ -f /etc/yum.repos.d/rudder.repo ] && return
    
        local dist_name=''
        local dist_version=''
    
        if [ -f /etc/redhat-release ]; then
                dist_name="CentOS" 
                dist_version=`cat /etc/redhat-release | sed -e 's/^.\+\([0-9]\+\.[0-9]\+\).\+$/\1/g'`
        fi
    
        [ "$dist_name" = "" ] && return
    
        if [ "$dist_name" = "CentOS" ]; then
            echo $dist_version | grep "^5." > /dev/null
            [ $? -eq 0 ] && echo "[Rudder_2.7]" > /etc/yum.repos.d/rudder.repo && echo "name=Rudder 2.7 Repository RHEL5" >> /etc/yum.repos.d/rudder.repo && echo "baseurl=http://www.rudder-project.org/rpm-2.7/RHEL_5/" >> /etc/yum.repos.d/rudder.repo && echo "gpgcheck=0" >> /etc/yum.repos.d/rudder.repo && echo "" >> /etc/yum.repos.d/rudder.repo
            echo $dist_version | grep "^6." > /dev/null
            [ $? -eq 0 ] && echo "[Rudder_2.7]" > /etc/yum.repos.d/rudder.repo && echo "name=Rudder 2.7 Repository RHEL6" >> /etc/yum.repos.d/rudder.repo && echo "baseurl=http://www.rudder-project.org/rpm-2.7/RHEL_6/" >> /etc/yum.repos.d/rudder.repo && echo "gpgcheck=0" >> /etc/yum.repos.d/rudder.repo && echo "" >> /etc/yum.repos.d/rudder.repo
        fi
    }
    
    install_rudder_repo
    
  3. yum clean all
  4. yum install rudder-agent -y
  5. yum update rudder-agent -y
    #echo "our policy server IP" > /var/rudder/cfengine-community/policy_server.dat
    #iptables -I INPUT -s our-policy-server-ip/32 -j ACCEPT
    #/etc/init.d/rudder-agent restart

However, as you can see cron is incorrect:

CT-10150-bash-4.1# grep -r  "rudder" /etc/cron*
/etc/cron.d/rudder-agent:# To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent.
/etc/cron.d/rudder-agent:0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e ${g.rudder_base}/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "${sys.workdir}/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then ${sys.workdir}/bin/cf-agent -f failsafe.cf \&\& ${sys.workdir}/bin/cf-agent; fi

After a few minutes, the cron file corrects itself (without me doing anything):

CT-10150-bash-4.1# grep -r  "rudder" /etc/cron*
/etc/cron.d/rudder-agent:# To temporarily avoid this behaviour, touch /opt/rudder/etc/disable-agent.
/etc/cron.d/rudder-agent:0,5,10,15,20,25,30,35,40,45,50,55 * * * * root if [ ! -e /opt/rudder/etc/disable-agent -a `ps -efww | grep -E "(cf-execd|cf-agent)" | grep -E "/var/rudder/cfengine-community/bin/(cf-execd|cf-agent)" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-agent -f failsafe.cf \&\& /var/rudder/cfengine-community/bin/cf-agent; fi

Would you like me to privately send you root login details to see this for yourself? (This is a test Linux VPS - CentOS 6 x86_64) If so, please confirm how to send this.

I've successfully reproduced this 5 times on the same system so I think it will be easy to reproduce if I give you access.

Thanks

Actions #21

Updated by Nicolas PERRON over 10 years ago

Andrew Cranson wrote:

Yes. I have a reproducible test case.

This is what I did exactly, step-by-step:

  1. Install a new Linux VPS with CentOS 6 x86_64 (on Parallels Cloud Server)
  2. Install rudder-agent repo with the following script:
    [...]
  3. yum clean all
  4. yum install rudder-agent -y
  5. yum update rudder-agent -y
    #echo "our policy server IP" > /var/rudder/cfengine-community/policy_server.dat
    #iptables -I INPUT -s our-policy-server-ip/32 -j ACCEPT
    #/etc/init.d/rudder-agent restart

However, as you can see cron is incorrect:

[...]

After a few minutes, the cron file corrects itself (without me doing anything):

[...]

Would you like me to privately send you root login details to see this for yourself? (This is a test Linux VPS - CentOS 6 x86_64) If so, please confirm how to send this.

I've successfully reproduced this 5 times on the same system so I think it will be easy to reproduce if I give you access.

Thanks

Ok, I see what's the problem. This is a duplicate of #3600

Actions #22

Updated by Nicolas PERRON over 10 years ago

  • Status changed from Discussion to In progress

I will consider this issue as the principal one between it and #3600.
Ok, so the problem is there until communication with a server. The problem is on the initial promises.

I'm on it.

Actions #23

Updated by Nicolas PERRON over 10 years ago

Got it.

This problem came from #3204 which added /etc/cron.d/rudder-agent as a packaged file. Except that the file is a template with unexpanded variables and it will not work until CFEngine is not launched.

The only solution I see is to replace the variables with sed into the file.

Actions #24

Updated by Nicolas PERRON over 10 years ago

  • Subject changed from When migrating a RHEL5 node to Rudder 2.6, CFEngine 3.4.4 is not correctly installed (libtokyo is missing) to Rudder cron file contains error until the use of CFEngine and will display error into /var/mail for root
  • Status changed from In progress to Pending technical review
  • % Done changed from 0 to 100
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/99

Pull Request URL added: https://github.com/Normation/rudder-packages/pull/99

Matthieu, could you review it please ?

Actions #25

Updated by Nicolas PERRON over 10 years ago

  • Assignee changed from Nicolas PERRON to Matthieu CERDA
Actions #26

Updated by Matthieu CERDA over 10 years ago

  • Assignee changed from Matthieu CERDA to Nicolas PERRON

I approve this commit, but I'm not sure that this should be merged now.

Actions #27

Updated by Andrew Cranson over 10 years ago

We can implement a workaround (using sed) in our installation script until this is approved, so no problem delaying the merge temporarily from our side. Thanks to you both for your help.

Actions #28

Updated by Jonathan CLARKE over 10 years ago

  • Status changed from Pending technical review to Discussion

Nico, I added some comments in the PR, to try and simplify this. Can you address them please?

Actions #29

Updated by Nicolas PERRON over 10 years ago

  • Assignee changed from Nicolas PERRON to Jonathan CLARKE

Jonathan CLARKE wrote:

Nico, I added some comments in the PR, to try and simplify this. Can you address them please?

PR Updated and rebased.

Actions #30

Updated by Nicolas PERRON over 10 years ago

  • Status changed from Discussion to Pending technical review
Actions #31

Updated by Jonathan CLARKE over 10 years ago

  • Project changed from 24 to 34
  • Category deleted (Techniques)
Actions #32

Updated by Jonathan CLARKE over 10 years ago

  • Status changed from Pending technical review to Pending release
Actions #33

Updated by Nicolas PERRON over 10 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 2.6.4, which was released today.
Check out:

Actions #34

Updated by Benoît PECCATTE about 9 years ago

  • Project changed from 34 to Rudder
  • Category set to Packaging
Actions

Also available in: Atom PDF