Actions
Bug #12243
closedAgent components should not try to load failsafe.cf when policies are broken
Pull Request:
Severity:
Critical - prevents main use of Rudder | no workaround | data loss | security
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Priority:
76
Name check:
Fix check:
Regression:
Description
I upgraded from rudder 4.2 (rudder-webapp_4.2.5~rc1~git201803160242-stretch0_all.deb) towards rudder-webapp_4.3.0~rc2~git201803200032-stretch0_all.deb on debian 9.
After upgrade, webapp is working, I can log, but nodes connected to the server can't get their policies anymore:
root@relay:/home/vagrant# rudder agent update -i rudder info: Failed to connect to server: Connection refused rudder info: No server is responding on port: 5309 rudder info: Unable to establish connection to 'server' error: No suitable server found rudder info: Automatically promoting context scope for 'rudder_promises_generated_tmp_file_error' to namespace visibility, due to persistence rudder info: Promise belongs to bundle 'update_action' in file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' near line 223 rudder info: Failed to connect to server: Connection refused rudder info: No server is responding on port: 5309 rudder info: Unable to establish connection to 'server' error: No suitable server found rudder info: Automatically promoting context scope for 'rudder_ncf_hash_update_error' to namespace visibility, due to persistence rudder info: Promise belongs to bundle 'update_action' in file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' near line 231 rudder info: Failed to connect to server: Connection refused rudder info: No server is responding on port: 5309 rudder info: Unable to establish connection to 'server' error: No suitable server found rudder info: Automatically promoting context scope for 'rudder_ncf_hash_update_error' to namespace visibility, due to persistence rudder info: Promise belongs to bundle 'update_action' in file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' near line 237 R: ********************************************************************************* * rudder-agent could not get an updated configuration from the policy server. * * This can be caused by: * * * a networking issue * * * an unavailable server * * * if the node's IP in not if the allowed networks of its policy server. * * Any existing configuration policy will continue to be applied without change. * ********************************************************************************* ok: Rudder agent promises were updated.
On the server, cf-execd is started, and the date is coherent with the update time:
root@server:/home/vagrant/plop# ps aux | grep cf- root 9011 0.0 0.5 105040 8972 ? Ss 09:40 0:00 /var/rudder/cfengine-community/bin/cf-execd --no-fork root 12321 0.0 0.5 38632 8912 ? Ss 09:40 0:00 /var/rudder/cfengine-community/bin/cf-serverd --no-fork
But on system logs, I have:
Mar 20 09:40:14 server systemd[1]: Stopped Rudder agent umbrella service. Mar 20 09:40:15 server systemd[1]: Started CFEngine Execution Scheduler. Mar 20 09:40:15 server systemd[1]: Starting Rudder agent umbrella service... Mar 20 09:40:15 server systemd[1]: Started CFEngine file server. Mar 20 09:40:15 server systemd[1]: Started Rudder agent umbrella service. Mar 20 09:40:15 server cf-serverd[9013]: error: Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.3/_var_rudder_ncf_common_20_cfe_basics': No such file or directory' for parsing. Mar 20 09:40:15 server cf-serverd[9013]: CFEngine(server) rudder Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.3/_var_rudder_ncf_common_20_cfe_basics': No such file or directory Mar 20 09:40:15 server systemd[1]: rudder-cf-serverd.service: Main process exited, code=exited, status=1/FAILURE Mar 20 09:40:15 server systemd[1]: rudder-cf-serverd.service: Unit entered failed state. Mar 20 09:40:15 server systemd[1]: rudder-cf-serverd.service: Failed with result 'exit-code'. ... Mar 20 09:40:26 server cf-serverd[12321]: /var/rudder/cfengine-community/inputs/promises.cf:362:0: error: Undefined bundle _create_current_expected_reports_file with type usebundle Mar 20 09:40:26 server cf-serverd[12321]: /var/rudder/cfengine-community/inputs/promises.cf:741:0: error: Undefined bundle _clean_old_expected_reports_file with type usebundle Mar 20 09:40:26 server cf-serverd[12321]: /var/rudder/cfengine-community/inputs/rudder-directives.cf:37:0: error: Undefined bundle current_technique_report_info with type usebundle Mar 20 09:40:26 server cf-serverd[12321]: error: Policy failed validation with command '"/var/rudder/cfengine-community/bin/cf-promises" -c "/var/rudder/cfengine-community/inputs/promises.cf"' Mar 20 09:40:26 server cf-serverd[12321]: error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe Mar 20 09:40:26 server cf-serverd[12321]: error: CFEngine failsafe.cf: /var/rudder/cfengine-community/inputs /var/rudder/cfengine-community/inputs/failsafe.cf Mar 20 09:40:26 server cf-serverd[12321]: CFEngine(server) Policy failed validation with command '"/var/rudder/cfengine-community/bin/cf-promises" -c "/var/rudder/cfengine-community/inputs/promises.cf"' Mar 20 09:40:26 server cf-serverd[12321]: CFEngine(server) CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe Mar 20 09:40:26 server cf-serverd[12321]: CFEngine(server) CFEngine failsafe.cf: /var/rudder/cfengine-community/inputs /var/rudder/cfengine-community/inputs/failsafe.cf Mar 20 09:40:26 server cf-serverd[12321]: notice: Server is starting... Mar 20 09:40:26 server cf-serverd[12321]: CFEngine(server) rudder Server is starting... ... Mar 20 09:40:40 server cf-serverd[12321]: notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:40:40 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:41:28 server cf-serverd[12321]: notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:41:28 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:41:55 server cf-serverd[12321]: notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:41:55 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:42:12 server cf-serverd[12321]: notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:42:12 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:43:29 server cf-serverd[12321]: notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:43:29 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' ... Mar 20 09:45:27 server cf-agent[19573]: CFEngine(agent) rudder R: @@Common@@control@@rudder@@run@@0@@start@@20180320-094124-53ed56e5@@2018-03-20 .... Mar 20 09:45:29 server cf-serverd[12321]: notice: Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' Mar 20 09:45:29 server cf-serverd[12321]: CFEngine(server) rudder Rereading policy file '/var/rudder/cfengine-community/inputs/failsafe.cf' .... Mar 20 09:45:31 server cf-agent[19573]: CFEngine(agent) rudder R: @@Common@@control@@rudder@@run@@0@@end@@20180320-094124-53ed56e5@@2018-03-20 09:45:26+00:00##root@#End execution
Actions