Bug #17268
closedAgent run exit code should be 0 when agent runs properly (no breaking/fatal errors)
Description
Hello,
I have noticed a strong behaviour of the rudder agent, present on v6.0.5 but not on v5.0.17.
When running rudder agent run, and everything works, it returns an exit code 1 :
# rudder agent run
Rudder agent 6.0.5-debian9
## Summary #####################################################################
58 components verified in 8 directives
=> 47 components in Enforce mode
-> 43 compliant
-> 2 repaired
-> 2 not-applicable
=> 11 components in Audit mode
-> 3 compliant
-> 6 not-applicable
-> 2 non-compliant
Execution time: 34.18s
################################################################################
# echo $?
1
There is no error, only repaired components, so I do not understand why this return 1.
A tedious impact is any autmating system (ansible, or rudder itself) used to deploy and run the agent, think it fails where it is not :-/
I found no information about return code in the man page
Is this behaviour normal, and if yes, do you know why?
Thanks!
Updated by Alexis Mousset over 4 years ago
- Category set to Agent
- Target version set to 6.0.6
This is not normal at all.
What is the return code of:
/opt/rudder/bin/cf-agent -KI
?
Updated by Alexis Mousset over 4 years ago
If it returns 0 you can add set -x
at the beginning of /opt/rudder/share/command/agent-run
and run rudder agent run
.
Updated by Victor Héry over 4 years ago
Hello,
I confirm the /opt/rudder/bin/cf-agent -KI
return 0
The set -x display something interesting at the very end of the script:
+ rm /var/rudder/reports/ready//2020-04-28T13:12:50+00:00@ee976dac-b164-4628-887b-20c8e70f1f46.log.gz
+ [ 0 -ne 0 ]
+ exit 1
It's from the line 196 from the agent-run script:
# merge exit codes (this is the eval exit code ... POSIX ...)
[ $code1 -ne 0 ] && exit $code1
exit $code2
And the code2 is indeed 1 :-/
+ code2=1
From the script:
"${RUDDER_DIR}/bin/cf-agent" ${VERBOSITY} ${COLOR} -K ${BUNDLE} ${CLASS} | $log_outputs | $timestamp | $runlog_output | eval ${PRETTY}
code2=$?
I guess the problem could be that set -o pipefail
is not enabled in the script (by purpose?) so the $?
return code is the one of eval ${PRETTY}
PRETTY is:
+ eval awk -v info="${DISPLAY_INFO}" -v full_strings="${FULL_STRINGS}" -v+ summary_only="${SUMMARY_ONLY}" -v quiet="${QUIET}" -v multihost="${MULTIHOST}" -v green="${GREEN}" -v darkgreen="${DARKGREEN}" -v red="${RED}" -v yellow="${YELLOW}" -v magenta="${MAGENTA}" -v normal="${NORMAL}" -v white="${WHITE}" -v cyan="${CYAN}" -v dblue="${DBLUE}" -v dgreen="${DGREEN}" -v timing="${TIMING}" -v has_fflush="${AWK_FFLUSH}"tee -v full_compliance="${FULL_COMPLIANCE}" -v partial_run="${PARTIAL_RUN}" -f /opt/rudder/share/commands/../lib/reports.awk /var/rudder/tmp/reports//2020-04-28T13:12:50+00:00@ee976dac-b164-4628-887b-20c8e70f1f46.log
awk -v info=0+ -v full_strings=0 -v summary_only=0 -v quiet=0 -v multihost=0 -v green=\033[1;32m -v darkgreen=\033[0;32m -v red=\033[1;31m -v yellow=\033[1;33m -v magenta=\033[1;35m -v normal=\033[0;39m\033[0;49m -v white=\033[0;02m/opt/rudder/share/commands/../lib/timestamp -v cyan=\033[1;36m -v dblue=\033[0;34m -v dgreen=\033[0;32m
-v timing=0 -v has_fflush=OK -v full_compliance=1 -v partial_run=0 -f /opt/rudder/share/commands/../lib/reports.awk
The only thing that seems odd is the -v white=\033[0;02m/opt/rudder/share/commands/../lib/timestamp
, I do not find from where this come, WHITE is absolutely correct according to the set -x
I have tried to launch the awk command, without results
I think I am missing something, may I give more information ? (perhaps the full set -x output?)
Updated by Victor Héry over 4 years ago
Here is the full output of the agent run with the set -x, hooe it could help!
https://copycat.drycat.fr/?afa2e665a9b97706#Gin3S4RL7FpqjopmUf8dvroCnLmSQMd5DLC4M2Un66mq
I have just removed the output of the cf-agent as it contains my policies :)
Thanks
Updated by Alexis Mousset over 4 years ago
The error could come from the awk program itself.
Could you try to add a debug print at the end of /opt/rudder/sahre/lib/report.awk
like:
# Set return code if (run_error+audit_error+audit_noncompliant+enforce_error != 0) { + printf "awk error" exit 1; }
to see if the error comes from there?
Updated by Victor Héry over 4 years ago
Good catch, I have specified all errors to see the exact problem:
if (run_error+audit_error+audit_noncompliant+enforce_error != 0) {
printf "run %s\n", run_error;
printf "audit %s\n", audit_error;
printf "audit non compliant %s\n", audit_noncompliant;
printf "enforce %s\n", enforce_error;
exit 1;
}
And indeed, I got the error at the end:
run 0
audit 0
audit non compliant 2
enforce 0
And indeed (again :-D) there are 2 non-compliant result in the agent, as said before:
## Summary #####################################################################
75 components verified in 11 directives
=> 64 components in Enforce mode
-> 57 compliant
-> 2 repaired
-> 5 not-applicable
=> 11 components in Audit mode
-> 3 compliant
-> 6 not-applicable
-> 2 non-compliant
Execution time: 41.72s
################################################################################
Does the fact that there are non-compliant rules should provoke this?
Here, the non-compliant rules are for an audit policy, not enforced, so I think they should be ignored (a fileTemplate creation in audit mode precisely)
Updated by Victor Héry over 4 years ago
To be more testfuly, I have tried the same manipulation on rudder 5.0.17, and the output is the same:
## Summary #####################################################################
76 components verified in 10 directives
=> 65 components in Enforce mode
-> 58 compliant
-> 2 repaired
-> 5 not-applicable
=> 11 components in Audit mode
-> 3 compliant
-> 6 not-applicable
-> 2 non-compliant
Execution time: 7.62s
################################################################################
run 0
audit 0
audit non compliant 2
enforce 0
BUT the agent report a exit code 0
Updated by Alexis Mousset over 4 years ago
No we should ignore them you're right.
Non compliant is not an error.
Updated by Alexis Mousset over 4 years ago
- Status changed from New to In progress
- Assignee set to Alexis Mousset
Updated by Alexis Mousset over 4 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Alexis Mousset to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-agent/pull/297
https://github.com/Normation/rudder-agent/pull/297
Does the proposed change make sense to you?
Updated by Victor Héry over 4 years ago
It seems logical but I am not sure, the awk for rudder 5 is exaclty the same (with the non_compliance), but the rudder agent does not return exit code 1 :-/
For rudder 5:
cat /opt/rudder/share/commands/../lib/reports.awk
# Set return code
if (run_error+audit_error+audit_noncompliant+enforce_error != 0) {
printf "run %s\n", run_error;
printf "audit %s\n", audit_error;
printf "audit non compliant %s\n", audit_noncompliant;
printf "enforce %s\n", enforce_error;
exit 1;
}
And the output has the audit non compliant 2
:
run 0
audit 0
audit non compliant 2
enforce 0
So I guess something has changed in the rudder-agent script itself, while I am not able to tell exactly what (even with set -x, all the variables involved in the pipe are complicated...
Does removing this test in the awk script do not risk to break something else?
Updated by Alexis Mousset over 4 years ago
As far as I know nothing is Rudder uses this exit code, so you can safely change it.
I think I know what made it return 0 in 5.0:
"${RUDDER_VAR}/cfengine-community/bin/cf-agent" ${VERBOSITY} ${COLOR} -K ${BUNDLE} ${CLASS} | tee ${logdir}/${logfile} | eval ${PRETTY}
ln -sf ${logdir}/${logfile} ${logdir}/previous
# merge exit codes (this is the eval exit code ... POSIX ...)
code2=$?
we check the exit code of the ln
command instead of the agent...
At least it's now fixed.
Updated by Victor Héry over 4 years ago
Oh gosh.
I was so focused on some mysterious pipe result I have missed the ln, great catch!
So it was a bug inside the bug...
I'll let you discuss about the solution, thanks for the help!
Updated by Alexis Mousset over 4 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder-agent|cbd56245d000d9b559ad919b526966510f1f6ea4.
Updated by Alexis Mousset over 4 years ago
- Subject changed from rudder agent run returns exit code 1 when everything is ok to "rudder agent run" exit code should be 0 except when agent could not run properly
- Fix check changed from To do to Checked
Updated by Vincent MEMBRÉ over 4 years ago
- Subject changed from "rudder agent run" exit code should be 0 except when agent could not run properly to Agent run exit code should be 0 when agent runs properly (no breaking/fatal errors)
Updated by Vincent MEMBRÉ over 4 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 6.0.6 which was released today.