Project

General

Profile

Actions

Bug #17268

closed

Agent run exit code should be 0 when agent runs properly (no breaking/fatal errors)

Added by Victor Héry almost 4 years ago. Updated almost 4 years ago.

Status:
Released
Priority:
N/A
Category:
Agent
Target version:
Severity:
Minor - inconvenience | misleading | easy workaround
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
Checked
Regression:

Description

Hello,

I have noticed a strong behaviour of the rudder agent, present on v6.0.5 but not on v5.0.17.

When running rudder agent run, and everything works, it returns an exit code 1 :

# rudder agent run
Rudder agent 6.0.5-debian9
## Summary #####################################################################
58 components verified in 8 directives
   => 47 components in Enforce mode
      -> 43 compliant
      -> 2 repaired
      -> 2 not-applicable
   => 11 components in Audit mode
      -> 3 compliant
      -> 6 not-applicable
      -> 2 non-compliant
Execution time: 34.18s
################################################################################
# echo $?
1

There is no error, only repaired components, so I do not understand why this return 1.

A tedious impact is any autmating system (ansible, or rudder itself) used to deploy and run the agent, think it fails where it is not :-/
I found no information about return code in the man page

Is this behaviour normal, and if yes, do you know why?

Thanks!


Subtasks 1 (0 open1 closed)

Bug #17283: Only errors preventing the agent from properly running should set an error return codeReleasedBenoît PECCATTEActions
Actions #1

Updated by Alexis Mousset almost 4 years ago

  • Category set to Agent
  • Target version set to 6.0.6

This is not normal at all.

What is the return code of:

/opt/rudder/bin/cf-agent -KI

?
Actions #2

Updated by Alexis Mousset almost 4 years ago

If it returns 0 you can add set -x at the beginning of /opt/rudder/share/command/agent-run and run rudder agent run.

Actions #3

Updated by Victor Héry almost 4 years ago

Hello,

I confirm the /opt/rudder/bin/cf-agent -KI return 0

The set -x display something interesting at the very end of the script:

+ rm /var/rudder/reports/ready//2020-04-28T13:12:50+00:00@ee976dac-b164-4628-887b-20c8e70f1f46.log.gz
+ [ 0 -ne 0 ]
+ exit 1

It's from the line 196 from the agent-run script:

# merge exit codes (this is the eval exit code ... POSIX ...)
[ $code1 -ne 0 ] && exit $code1
exit $code2

And the code2 is indeed 1 :-/

+ code2=1

From the script:

"${RUDDER_DIR}/bin/cf-agent" ${VERBOSITY} ${COLOR} -K ${BUNDLE} ${CLASS} | $log_outputs | $timestamp | $runlog_output | eval ${PRETTY}
code2=$?

I guess the problem could be that set -o pipefail is not enabled in the script (by purpose?) so the $? return code is the one of eval ${PRETTY}

PRETTY is:

+ eval awk -v info="${DISPLAY_INFO}" -v full_strings="${FULL_STRINGS}" -v+  summary_only="${SUMMARY_ONLY}" -v quiet="${QUIET}" -v multihost="${MULTIHOST}" -v green="${GREEN}" -v darkgreen="${DARKGREEN}" -v red="${RED}" -v yellow="${YELLOW}" -v magenta="${MAGENTA}" -v normal="${NORMAL}" -v white="${WHITE}" -v cyan="${CYAN}" -v dblue="${DBLUE}" -v dgreen="${DGREEN}" -v timing="${TIMING}" -v has_fflush="${AWK_FFLUSH}"tee -v full_compliance="${FULL_COMPLIANCE}" -v partial_run="${PARTIAL_RUN}" -f /opt/rudder/share/commands/../lib/reports.awk /var/rudder/tmp/reports//2020-04-28T13:12:50+00:00@ee976dac-b164-4628-887b-20c8e70f1f46.log

awk -v info=0+  -v full_strings=0 -v summary_only=0 -v quiet=0 -v multihost=0 -v green=\033[1;32m -v darkgreen=\033[0;32m -v red=\033[1;31m -v yellow=\033[1;33m -v magenta=\033[1;35m -v normal=\033[0;39m\033[0;49m -v white=\033[0;02m/opt/rudder/share/commands/../lib/timestamp -v cyan=\033[1;36m -v dblue=\033[0;34m -v dgreen=\033[0;32m
 -v timing=0 -v has_fflush=OK -v full_compliance=1 -v partial_run=0 -f /opt/rudder/share/commands/../lib/reports.awk

The only thing that seems odd is the -v white=\033[0;02m/opt/rudder/share/commands/../lib/timestamp, I do not find from where this come, WHITE is absolutely correct according to the set -x

I have tried to launch the awk command, without results

I think I am missing something, may I give more information ? (perhaps the full set -x output?)

Actions #4

Updated by Victor Héry almost 4 years ago

Here is the full output of the agent run with the set -x, hooe it could help!

https://copycat.drycat.fr/?afa2e665a9b97706#Gin3S4RL7FpqjopmUf8dvroCnLmSQMd5DLC4M2Un66mq

I have just removed the output of the cf-agent as it contains my policies :)

Thanks

Actions #5

Updated by Alexis Mousset almost 4 years ago

The error could come from the awk program itself.

Could you try to add a debug print at the end of /opt/rudder/sahre/lib/report.awk like:

  # Set return code
  if (run_error+audit_error+audit_noncompliant+enforce_error != 0) {
+   printf "awk error" 
    exit 1;
  }

to see if the error comes from there?

Actions #6

Updated by Victor Héry almost 4 years ago

Good catch, I have specified all errors to see the exact problem:

  if (run_error+audit_error+audit_noncompliant+enforce_error != 0) {
    printf "run %s\n", run_error;
    printf "audit %s\n", audit_error;
    printf "audit non compliant %s\n", audit_noncompliant;
    printf "enforce %s\n", enforce_error;
    exit 1;
  }

And indeed, I got the error at the end:

run 0
audit 0
audit non compliant 2
enforce 0

And indeed (again :-D) there are 2 non-compliant result in the agent, as said before:

## Summary #####################################################################
75 components verified in 11 directives
   => 64 components in Enforce mode
      -> 57 compliant
      -> 2 repaired
      -> 5 not-applicable
   => 11 components in Audit mode
      -> 3 compliant
      -> 6 not-applicable
      -> 2 non-compliant
Execution time: 41.72s
################################################################################

Does the fact that there are non-compliant rules should provoke this?

Here, the non-compliant rules are for an audit policy, not enforced, so I think they should be ignored (a fileTemplate creation in audit mode precisely)

Actions #7

Updated by Victor Héry almost 4 years ago

To be more testfuly, I have tried the same manipulation on rudder 5.0.17, and the output is the same:

## Summary #####################################################################
76 components verified in 10 directives
   => 65 components in Enforce mode
      -> 58 compliant
      -> 2 repaired
      -> 5 not-applicable
   => 11 components in Audit mode
      -> 3 compliant
      -> 6 not-applicable
      -> 2 non-compliant
Execution time: 7.62s
################################################################################
run 0
audit 0
audit non compliant 2
enforce 0

BUT the agent report a exit code 0

Actions #8

Updated by Alexis Mousset almost 4 years ago

No we should ignore them you're right.

Non compliant is not an error.

Actions #9

Updated by Alexis Mousset almost 4 years ago

  • Status changed from New to In progress
  • Assignee set to Alexis Mousset
Actions #10

Updated by Alexis Mousset almost 4 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Alexis Mousset to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-agent/pull/297

https://github.com/Normation/rudder-agent/pull/297

Does the proposed change make sense to you?

Actions #11

Updated by Victor Héry almost 4 years ago

It seems logical but I am not sure, the awk for rudder 5 is exaclty the same (with the non_compliance), but the rudder agent does not return exit code 1 :-/

For rudder 5:

cat /opt/rudder/share/commands/../lib/reports.awk
  # Set return code
  if (run_error+audit_error+audit_noncompliant+enforce_error != 0) {
    printf "run %s\n", run_error;
    printf "audit %s\n", audit_error;
    printf "audit non compliant %s\n", audit_noncompliant;
    printf "enforce %s\n", enforce_error;
    exit 1;
  }

And the output has the audit non compliant 2:

run 0
audit 0
audit non compliant 2
enforce 0

So I guess something has changed in the rudder-agent script itself, while I am not able to tell exactly what (even with set -x, all the variables involved in the pipe are complicated...

Does removing this test in the awk script do not risk to break something else?

Actions #12

Updated by Alexis Mousset almost 4 years ago

As far as I know nothing is Rudder uses this exit code, so you can safely change it.

I think I know what made it return 0 in 5.0:

"${RUDDER_VAR}/cfengine-community/bin/cf-agent" ${VERBOSITY} ${COLOR} -K ${BUNDLE} ${CLASS} | tee ${logdir}/${logfile} | eval ${PRETTY}
ln -sf ${logdir}/${logfile} ${logdir}/previous
# merge exit codes (this is the eval exit code ... POSIX ...)
code2=$?

we check the exit code of the ln command instead of the agent...

At least it's now fixed.

Actions #13

Updated by Victor Héry almost 4 years ago

Oh gosh.
I was so focused on some mysterious pipe result I have missed the ln, great catch!

So it was a bug inside the bug...

I'll let you discuss about the solution, thanks for the help!

Actions #14

Updated by Alexis Mousset almost 4 years ago

  • Status changed from Pending technical review to Pending release
Actions #15

Updated by Alexis Mousset almost 4 years ago

  • Subject changed from rudder agent run returns exit code 1 when everything is ok to "rudder agent run" exit code should be 0 except when agent could not run properly
  • Fix check changed from To do to Checked
Actions #16

Updated by Vincent MEMBRÉ almost 4 years ago

  • Subject changed from "rudder agent run" exit code should be 0 except when agent could not run properly to Agent run exit code should be 0 when agent runs properly (no breaking/fatal errors)
Actions #17

Updated by Vincent MEMBRÉ almost 4 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 6.0.6 which was released today.

Actions

Also available in: Atom PDF