Project

General

Profile

Actions

Bug #12243

closed

Agent components should not try to load failsafe.cf when policies are broken

Added by François ARMAND about 6 years ago. Updated almost 6 years ago.

Status:
Released
Priority:
N/A
Category:
Server components
Target version:
Severity:
Critical - prevents main use of Rudder | no workaround | data loss | security
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Priority:
76
Name check:
Fix check:
Regression:

Description

I upgraded from rudder 4.2 (rudder-webapp_4.2.5~rc1~git201803160242-stretch0_all.deb) towards rudder-webapp_4.3.0~rc2~git201803200032-stretch0_all.deb on debian 9.

After upgrade, webapp is working, I can log, but nodes connected to the server can't get their policies anymore:

root@relay:/home/vagrant# rudder agent update -i
rudder     info: Failed to connect to server: Connection refused
rudder     info: No server is responding on port: 5309
rudder     info: Unable to establish connection to 'server'
   error: No suitable server found
rudder     info: Automatically promoting context scope for 'rudder_promises_generated_tmp_file_error' to namespace visibility, due to persistence
rudder     info: Promise belongs to bundle 'update_action' in file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' near line 223
rudder     info: Failed to connect to server: Connection refused
rudder     info: No server is responding on port: 5309
rudder     info: Unable to establish connection to 'server'
   error: No suitable server found
rudder     info: Automatically promoting context scope for 'rudder_ncf_hash_update_error' to namespace visibility, due to persistence
rudder     info: Promise belongs to bundle 'update_action' in file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' near line 231
rudder     info: Failed to connect to server: Connection refused
rudder     info: No server is responding on port: 5309
rudder     info: Unable to establish connection to 'server'
   error: No suitable server found
rudder     info: Automatically promoting context scope for 'rudder_ncf_hash_update_error' to namespace visibility, due to persistence
rudder     info: Promise belongs to bundle 'update_action' in file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' near line 237
R: *********************************************************************************
* rudder-agent could not get an updated configuration from the policy server.   *
* This can be caused by:                                                        *
*   * a networking issue                                                        *
*   * an unavailable server                                                     *
*   * if the node's IP in not if the allowed networks of its policy server.     *
* Any existing configuration policy will continue to be applied without change. *
*********************************************************************************
ok: Rudder agent promises were updated.

On the server, cf-execd is started, and the date is coherent with the update time:

root@server:/home/vagrant/plop# ps aux | grep cf-
root      9011  0.0  0.5 105040  8972 ?        Ss   09:40   0:00 /var/rudder/cfengine-community/bin/cf-execd --no-fork
root     12321  0.0  0.5  38632  8912 ?        Ss   09:40   0:00 /var/rudder/cfengine-community/bin/cf-serverd --no-fork

But on system logs, I have:

Mar 20 09:40:14 server systemd[1]: Stopped Rudder agent umbrella service.
Mar 20 09:40:15 server systemd[1]: Started CFEngine Execution Scheduler.
Mar 20 09:40:15 server systemd[1]: Starting Rudder agent umbrella service...
Mar 20 09:40:15 server systemd[1]: Started CFEngine file server.
Mar 20 09:40:15 server systemd[1]: Started Rudder agent umbrella service.
Mar 20 09:40:15 server cf-serverd[9013]:    error: Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.3/_var_rudder_ncf_common_20_cfe_basics': No such file or directory' for parsing.
Mar 20 09:40:15 server cf-serverd[9013]: CFEngine(server) rudder Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.3/_var_rudder_ncf_common_20_cfe_basics': No such file or directory
Mar 20 09:40:15 server systemd[1]: rudder-cf-serverd.service: Main process exited, code=exited, status=1/FAILURE
Mar 20 09:40:15 server systemd[1]: rudder-cf-serverd.service: Unit entered failed state.
Mar 20 09:40:15 server systemd[1]: rudder-cf-serverd.service: Failed with result 'exit-code'.
...

Mar 20 09:40:26 server cf-serverd[12321]: /var/rudder/cfengine-community/inputs/promises.cf:362:0: error: Undefined bundle _create_current_expected_reports_file with type usebundle
Mar 20 09:40:26 server cf-serverd[12321]: /var/rudder/cfengine-community/inputs/promises.cf:741:0: error: Undefined bundle _clean_old_expected_reports_file with type usebundle
Mar 20 09:40:26 server cf-serverd[12321]: /var/rudder/cfengine-community/inputs/rudder-directives.cf:37:0: error: Undefined bundle current_technique_report_info with type usebundle
Mar 20 09:40:26 server cf-serverd[12321]:    error: Policy failed validation with command '"/var/rudder/cfengine-community/bin/cf-promises" -c "/var/rudder/cfengine-community/inputs/promises.cf"'
Mar 20 09:40:26 server cf-serverd[12321]:    error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe
Mar 20 09:40:26 server cf-serverd[12321]:    error: CFEngine failsafe.cf: /var/rudder/cfengine-community/inputs /var/rudder/cfengine-community/inputs/failsafe.cf
Mar 20 09:40:26 server cf-serverd[12321]: CFEngine(server)  Policy failed validation with command '"/var/rudder/cfengine-community/bin/cf-promises" -c "/var/rudder/cfengine-community/inputs/promises.cf"'
Mar 20 09:40:26 server cf-serverd[12321]: CFEngine(server)  CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe
Mar 20 09:40:26 server cf-serverd[12321]: CFEngine(server)  CFEngine failsafe.cf: /var/rudder/cfengine-community/inputs /var/rudder/cfengine-community/inputs/failsafe.cf
Mar 20 09:40:26 server cf-serverd[12321]:   notice: Server is starting...
Mar 20 09:40:26 server cf-serverd[12321]: CFEngine(server) rudder Server is starting...
...

Mar 20 09:40:40 server cf-serverd[12321]:   notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:40:40 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:41:28 server cf-serverd[12321]:   notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:41:28 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:41:55 server cf-serverd[12321]:   notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:41:55 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:42:12 server cf-serverd[12321]:   notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:42:12 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:43:29 server cf-serverd[12321]:   notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:43:29 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
...

Mar 20 09:45:27 server cf-agent[19573]: CFEngine(agent) rudder R: @@Common@@control@@rudder@@run@@0@@start@@20180320-094124-53ed56e5@@2018-03-20 
  ....
Mar 20 09:45:29 server cf-serverd[12321]:   notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
Mar 20 09:45:29 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf'
  ....
Mar 20 09:45:31 server cf-agent[19573]: CFEngine(agent) rudder R: @@Common@@control@@rudder@@run@@0@@end@@20180320-094124-53ed56e5@@2018-03-20 09:45:26+00:00##root@#End execution


Subtasks 1 (0 open1 closed)

Bug #12265: rudder agent check should trigger failsafe run when promises are brokenReleasedAlexis MoussetActions
Actions

Also available in: Atom PDF