Bug #10532
closed
On centos7, generation fails because of hook policy-generation-node-ready/10-cf-promise-check
Added by François ARMAND over 7 years ago.
Updated over 7 years ago.
Category:
Web - Config management
Description
The generation error is:
⇨ Policy update error for process '4' at 2017-03-29 13:25:32
⇨ Cannot write configuration node
⇨ Exit code=1 for hook: '/opt/rudder/etc/hooks.d/policy-generation-node-ready/10-cf-promise-check' with environment variables: [PATH:/sbin:/usr/sbin:/bin:/usr/bin] [SYSTEMCTL_IGNORE_DEPENDENCIES:] [NLSPATH:/usr/dt/lib/nls/msg/%L/%N.cat] [SYSTEMCTL_SKIP_REDIRECT:] [OLDPWD:/opt/rudder/jetty7] [TERM:vt100] [XFILESEARCHPATH:/usr/dt/app-defaults/%L/Dt] [PWD:/opt/rudder/jetty7] [SHLVL:2] [_:/bin/java] [RUDDER_GENERATION_DATETIME:2017-03-29T13:25:29.110Z] [RUDDER_NODEID:d456d9d9-9d1a-4116-a495-340ed55f6c32] [RUDDER_NEXT_POLICIES_DIRECTORY:/var/rudder/share/d456d9d9-9d1a-4116-a495-340ed55f6c32/rules.new/cfengine-community] [RUDDER_AGENT_TYPE:cfengine-community].
Stdout: ' error: Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.0/_var_rudder_ncf_common_10_ncf_internals': No such file or directory' for parsing. (stat: No such file or directory)
'
Stderr: ''
This is transcient and starting an other full regeneration manually corrected the problem.
We have the problem on 2 different centos 7. The failing part is not always the same. Asking for a new regeneration (or a full new generation) clean the problem (most of the time).
Asking for a full regeneration have a hight probability (35% ?) of chance to lead to a variant of the error.
It is not always the same node, nor the same file which have the error. Below, some example:
#### two nodes failing, root and one behing a relay:
Exit code=1 for hook: '/opt/rudder/etc/hooks.d/policy-generation-node-ready/10-cf-promise-check' with environment variables: [PATH:/sbin:/usr/sbin:/bin:/usr/bin] [SYSTEMCTL_IGNORE_DEPENDENCIES:] [NLSPATH:/usr/dt/lib/nls/msg/%L/%N.cat] [SYSTEMCTL_SKIP_REDIRECT:] [OLDPWD:/opt/rudder/jetty7] [TERM:vt100] [XFILESEARCHPATH:/usr/dt/app-defaults/%L/Dt] [PWD:/opt/rudder/jetty7] [SHLVL:2] [_:/bin/java] [RUDDER_GENERATION_DATETIME:2017-03-29T14:16:00.537Z] [RUDDER_NODEID:root] [RUDDER_NEXT_POLICIES_DIRECTORY:/var/rudder/cfengine-community/inputs.new] [RUDDER_AGENT_TYPE:cfengine-community].
Stdout: ' error: Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.0/_var_rudder_ncf_common_30_generic_methods': No such file or directory' for parsing. (stat: No such file or directory)
'
Stderr: '' ; Exit code=1 for hook: '/opt/rudder/etc/hooks.d/policy-generation-node-ready/10-cf-promise-check' with environment variables: [PATH:/sbin:/usr/sbin:/bin:/usr/bin] [SYSTEMCTL_IGNORE_DEPENDENCIES:] [NLSPATH:/usr/dt/lib/nls/msg/%L/%N.cat] [SYSTEMCTL_SKIP_REDIRECT:] [OLDPWD:/opt/rudder/jetty7] [TERM:vt100] [XFILESEARCHPATH:/usr/dt/app-defaults/%L/Dt] [PWD:/opt/rudder/jetty7] [SHLVL:2] [_:/bin/java] [RUDDER_GENERATION_DATETIME:2017-03-29T14:16:00.537Z] [RUDDER_NODEID:26acd240-5347-4e0c-91e9-d6281341cf2b] [RUDDER_NEXT_POLICIES_DIRECTORY:/var/rudder/share/19a0eaaf-7d56-4251-a697-83942b18df20/share/26acd240-5347-4e0c-91e9-d6281341cf2b/rules.new/cfengine-community] [RUDDER_AGENT_TYPE:cfengine-community].
Stdout: ' error: Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.0/_var_rudder_ncf_common_10_ncf_internals': No such file or directory' for parsing. (stat: No such file or directory)
'
Stderr: ''
#### one other example:
⇨ Policy update error for process '17' at 2017-03-29 14:22:29
⇨ Cannot write configuration node
⇨ Exit code=1 for hook: '/opt/rudder/etc/hooks.d/policy-generation-node-ready/10-cf-promise-check' with environment variables: [PATH:/sbin:/usr/sbin:/bin:/usr/bin] [SYSTEMCTL_IGNORE_DEPENDENCIES:] [NLSPATH:/usr/dt/lib/nls/msg/%L/%N.cat] [SYSTEMCTL_SKIP_REDIRECT:] [OLDPWD:/opt/rudder/jetty7] [TERM:vt100] [XFILESEARCHPATH:/usr/dt/app-defaults/%L/Dt] [PWD:/opt/rudder/jetty7] [SHLVL:2] [_:/bin/java] [RUDDER_GENERATION_DATETIME:2017-03-29T14:22:23.982Z] [RUDDER_NODEID:19a0eaaf-7d56-4251-a697-83942b18df20] [RUDDER_NEXT_POLICIES_DIRECTORY:/var/rudder/share/19a0eaaf-7d56-4251-a697-83942b18df20/rules.new/cfengine-community] [RUDDER_AGENT_TYPE:cfengine-community].
Stdout: ' error: Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.0/_var_rudder_ncf_common_60_services': No such file or directory' for parsing. (stat: No such file or directory)
'
Stderr: ''
#### an other example:
⇨ Policy update error for process '26' at 2017-03-29 14:27:30
⇨ Cannot write configuration node
⇨ Exit code=1 for hook: '/opt/rudder/etc/hooks.d/policy-generation-node-ready/10-cf-promise-check' with environment variables: [PATH:/sbin:/usr/sbin:/bin:/usr/bin] [SYSTEMCTL_IGNORE_DEPENDENCIES:] [NLSPATH:/usr/dt/lib/nls/msg/%L/%N.cat] [SYSTEMCTL_SKIP_REDIRECT:] [OLDPWD:/opt/rudder/jetty7] [TERM:vt100] [XFILESEARCHPATH:/usr/dt/app-defaults/%L/Dt] [PWD:/opt/rudder/jetty7] [SHLVL:2] [_:/bin/java] [RUDDER_GENERATION_DATETIME:2017-03-29T14:27:23.982Z] [RUDDER_NODEID:26acd240-5347-4e0c-91e9-d6281341cf2b] [RUDDER_NEXT_POLICIES_DIRECTORY:/var/rudder/share/19a0eaaf-7d56-4251-a697-83942b18df20/share/26acd240-5347-4e0c-91e9-d6281341cf2b/rules.new/cfengine-community] [RUDDER_AGENT_TYPE:cfengine-community].
Stdout: ' error: Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.0/_var_rudder_ncf_common_20_cfe_basics': No such file or directory' for parsing. (stat: No such file or directory)
'
Stderr: ''
And so on.
This is a race condition in list-compatible-inputs that seems to only happen on CentOS7:
root@server:/opt/rudder/jetty7# /var/rudder/cfengine-community/bin/cf-promises -f /var/rudder/cfengine-community/inputs.new/promises.cf &
[6] 22088
root@server:/opt/rudder/jetty7# /var/rudder/cfengine-community/bin/cf-promises -f /var/rudder/cfengine-community/inputs.new/promises.cf &
[7] 22645
root@server:/opt/rudder/jetty7# error: Can't stat file '/var/rudder/ncf/find: '/var/rudder/cfengine-community/state/ncf-exclude-cache-3.10.0/_var_rudder_ncf_common_30_generic_methods': No such file or directory' for parsing. (stat: No such file or directory)
This only happen when SELinux is enabled. If we "setenforce 0", the problem completelly disapears.
So, it may not be linked to selinux after all - it was just less frequent when testing that track.
The problem may be that the file "/opt/rudder/etc/agent-capabilities" was not viewed as newer than the cache (because for some reason, the cache was modified in the future), and so one of the parallel process delete the cache, and the next one try to do comparison on deleted files.
- Assignee set to Alexis Mousset
- Target version changed from 4.1.1 to 4.0.4
What happens here:
- Our CentOS7 package builder had an EDT (-04) timezone for some reason
- When installing recently build packages on a machine ahead of EDT, /opt/rudder/etc/agent-capabilities has a modification date in the future.
- The comparison for compatible files cache invalidation in ncf's list-compatible-inputs is based on the modification date of /opt/rudder/etc/agent-capabilities compared to the cache itself
- list-compatible-inputs always invalidates the cache and rebuilds a new one
- When starting several list-compatible-inputs calls at the same time (what is done by the generation process when calling cf-promises to check the files), all of them will invalidate the cache every time. This leads to race conditions where one of the instances tries to reach a file that has just been removed by another one. Normally this should be very uncommon, and can only happen just after updating the agent or ncf.
- => generation is randomly broken
We will touch the agent-capability file in postinstall to avoid this situation in the future. The race condition is not really fixable right now, but policy validation on the nodes instead of the server will fix this issue.
Note: The packages built far enough in the past have now become good (like wine or cheese), so it is not a blocking issue for 4.1.0.
- Status changed from New to In progress
- Status changed from In progress to Pending technical review
- Assignee changed from Alexis Mousset to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-packages/pull/1326
- Status changed from Pending technical review to Pending release
- Status changed from Pending release to Released
This bug has been fixed in Rudder 4.0.4 and 4.1.1 which were released today.
Also available in: Atom
PDF