Project

General

Profile

Actions

Bug #3620

closed

The reporting of "Common Policies > Update" could be in a 'No Answer' status

Added by Dennis Cabooter about 11 years ago. Updated over 9 years ago.

Status:
Released
Priority:
N/A
Category:
Techniques
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

After upgrading from 2.5 to 2.6, "Common Policies > Update" shows No Answer status.

Actions #1

Updated by Nicolas PERRON about 11 years ago

  • Status changed from New to Discussion

This should not happens. Is this Component the only one in No Answer status ? And are all the Component "Update" in this state ?

What is the output of this command on a node ?

/var/rudder/cfengine-community/bin/cf-agent -KIb update

Actions #2

Updated by Vincent MEMBRÉ about 11 years ago

I found the same status, on a node managed, An update component was on NoAnswer.

But when I ran the agent it produced a report, and now the component is in Success.

Actions #3

Updated by Dennis Cabooter about 11 years ago

 >> Using command line specified bundlesequence
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@73@@Update@@None@@2013-06-03 13:56:12+02:00##37cc2d4e-d5f2-401e-af72-a7c44eca00ae@#Policy and dependencies already up to date. No action required.

The status is now Succes. I wonder why the update component did get the no answer status and I wonder if that does not come back.

Actions #4

Updated by Dennis Cabooter about 11 years ago

The status was success for a little while and is now back in No Answer state.

Actions #5

Updated by Vincent MEMBRÉ about 11 years ago

Have you changed a Rule/Directive/group ? does a redeployment happened before having the NoAnswer ?

This is happening on all nodes ? or a part of them ? only to one ?

Actions #6

Updated by Dennis Cabooter about 11 years ago

I didn't change a Rule/Directive/group lately - it happened to occur after the upgrade from 2.5 to 2.6. I don't know if a redeployment happened before having the NoAnswer. It's only happening on some nodes, not all.

Actions #7

Updated by Nicolas PERRON about 11 years ago

Dennis Cabooter wrote:

I didn't change a Rule/Directive/group lately - it happened to occur after the upgrade from 2.5 to 2.6. I don't know if a redeployment happened before having the NoAnswer. It's only happening on some nodes, not all.

Ok, could you describe us the particularity of these nodes ? Are OS, OS version/Service Pack different from the other nodes ?
Are the system Techniques up to date ? (Techniques from /opt/rudder/share/techniques/system/ should be identical to /var/rudder/configuration-repository/system/)

Actions #8

Updated by Vincent MEMBRÉ about 11 years ago

Maybe we missed a case in the reporting of update.

  reports:
    server_ok::
      "@@Common@@log_repaired@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Started the server (cf-serverd)";
    executor_ok::
      "@@Common@@log_repaired@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Started the scheduler (cf-execd)";

    no_update::
      "@@Common@@result_error@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Cannot update node's policy (CFEngine promises)";

    rudder_dependencies_update_error::
      "@@Common@@result_error@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Cannot update dependencies";

    rudder_promises_generated_error::
      "@@Common@@result_error@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Cannot update node's policy or dependencies";

    (rudder_promises_generated_ok|(rudder_dependencies_updated_ok.config_ok)).!(rudder_promises_generated_repaired|rudder_promises_generated_error|rudder_dependencies_updated|rudder_dependencies_update_error|config|no_update)::
      "@@Common@@result_success@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Policy and dependencies already up to date. No action required.";

    rudder_dependencies_updated::
      "@@Common@@log_repaired@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Dependencies updated";

    config::
      "@@Common@@log_repaired@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Node's policy (CFEngine promises) updated";

    rudder_promises_generated_repaired|config|rudder_dependencies_updated|server_ok|executor_ok::
      "@@Common@@result_repaired@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Policy or dependencies were updated or CFEngine service restarted";

    policy_server::
      "@@Common@@result_success@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Policy server doesn't need to be updated";

I really suspect that some condition are missing here :

    (rudder_promises_generated_ok|(rudder_dependencies_updated_ok.config_ok)).!(rudder_promises_generated_repaired|rudder_promises_generated_error|rudder_dependencies_updated|rudder_dependencies_update_error|config|no_update)::
      "@@Common@@result_success@@&TRACKINGKEY&@@Update@@None@@${g.execRun}##${g.uuid}@#Policy and dependencies already up to date. No action required.";

Anyway, the reporting should indicate that the report is missing (unknwon reports), not showing a no answer. But that should be another bug.

Actions #9

Updated by Dennis Cabooter about 11 years ago

If I run cf-agent -KI, Rudder reports success. However, later on it reports No Anwer again. Not on all nodes.

Actions #10

Updated by Nicolas PERRON about 11 years ago

Dennis Cabooter wrote:

If I run cf-agent -KI, Rudder reports success. However, later on it reports No Anwer again. Not on all nodes.

Ok, but what are the difference between these nodes ? Are the OS identical ?

Actions #11

Updated by Dennis Cabooter about 11 years ago

It happens on Ubuntu 12.04 Precise LTS and RHEL 5.x nodes.

Actions #12

Updated by Nicolas PERRON about 11 years ago

It seems difficult to reproduce this bug but It could be possible that we have it in our lab.
We can't provoke it, so we will add some debugging messages and let it works some hours in observation.

Actions #13

Updated by Nicolas PERRON about 11 years ago

We have notice that the difference between a Success and No Answer is really due to an absence of reporting from the component Update althought it is defined to be launched by the bundlesequence:

R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@33@@Update@@None@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#Policy and dependencies already up to date. No action required.
R: Class 'rudder_promises_generated_ok' exist now
R: Update is launched
R: @@Common@@log_info@@hasPolicyServer-root@@common-root@@33@@common@@StartRun@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#Start execution
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@33@@Security parameters@@None@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#The internal environment security is acceptable
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@33@@Red Button@@None@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#Red Button is not in effect, continuing as normal...
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@33@@Process checking@@None@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#There is an acceptable number of CFEngine processes running on the machine
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@33@@CRON Daemon@@None@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#The CRON daemon is running
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@33@@Binaries update@@None@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#The CFengine binaries in /var/rudder/cfengine-community/bin are up to date
R: @@Common@@log_info@@hasPolicyServer-root@@common-root@@33@@Log system for reports@@None@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#Detected running syslog as syslog-ng
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@33@@Log system for reports@@None@@2013-06-05 12:56:00+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#Logging system for report centralization is already correctly configured
[...]

The first component 'Update' is launched by failsafe, since cf-exec launch failsafe.cf AND promises.cf.

After forcing cf-execd to launch cf-agent in verbose mode, we have noticed that Update was launched but that no class is thrown:

[...]
rudder>    =========================================================
rudder>    reports in bundle update (1)
rudder>    =========================================================
[...]
rudder> Skipping whole next promise (@@Common@@result_error@@hasPolicyServer-root@@common-root@@33@@Update@@None@@2013-06-05 13:56:17+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#Cannot update node's policy or dependencies), as context rudder_promises_generated_error is not relevant
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder>  XX Nothing promised here [last.update.reports.-centos-6-64.__Commo] (0/1 minutes elapsed)
rudder>
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder> Skipping whole next promise (@@Common@@log_repaired@@hasPolicyServer-root@@common-root@@33@@Update@@None@@2013-06-05 13:56:17+02:00##36076ea3-281c-480e-be25-1392bfc39b71@#Dependencies updated), as context rudder_dependencies_updated is not relevant
[...]
rudder>
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder> Skipping whole next promise (Class 'no_update' exist now), as context no_update is not relevant
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder>  XX Nothing promised here [last.update.reports.-centos-6-64.Class__] (0/1 minutes elapsed)
rudder>
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder> Skipping whole next promise (*********************************************************************************
* rudder-agent could not get an updated configuration from the policy server.   *
* This can be caused by a network issue, an unavailable server, or if this      *
* node was deleted from the Rudder root server.                                 *
* Any existing configuration policy will continue to be applied without change. *
*********************************************************************************), as context rudder_promises_generated_error|no_update is not relevant
[...]
rudder>
rudder>    =========================================================
rudder>    reports in bundle update (2)
rudder>    =========================================================
rudder>
rudder>
[...]
rudder> Skipping whole next promise (Class 'config' exist now), as context config is not relevant
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder>
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder> Skipping whole next promise (Class 'no_update' exist now), as context no_update is not relevant
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder>
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[...]

The must odd part is that we can see XX Nothing promised here [last.update.reports.-centos-6-64.Class_] (0/1 minutes elapsed)_ which seems to be related to lock files. Furthermore, this expression appear in place of a class expression which concern a negation.

Extract of CFEngine 3 code

[...]
    no_update::
      "Class 'no_update' exist now";

    !no_update::
      "Class 'no_update' does not exist now";
[...]

We should have had an output like this:

[...]
rudder> Skipping whole next promise (Class 'no_update' exist now), as context no_update is not relevant
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder>
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder>     Promise's handle:
rudder>     Promise made by: "Class 'no_update' does not exist now" 
rudder>     .........................................................
rudder>
rudder> Report: Class 'no_update' does not exist now
rudder> R: Class 'no_update' does not exist now
[...]

Instead of:

[...]
rudder> Skipping whole next promise (Class 'no_update' exist now), as context no_update is not relevant
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder>  XX Nothing promised here [last.update.reports.-centos-6-64.Class__] (0/1 minutes elapsed)
rudder>
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
[...]

Finally, the second pass of update show that no more the class !no_update is checked:

[...]
rudder> Skipping whole next promise (Class 'no_update' exist now), as context no_update is not relevant
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder>
rudder> . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rudder> Skipping whole next promise (*********************************************************************************
* rudder-agent could not get an updated configuration from the policy server.   *
* This can be caused by a network issue, an unavailable server, or if this      *
* node was deleted from the Rudder root server.                                 *
* Any existing configuration policy will continue to be applied without change. *
*********************************************************************************), as context rudder_promises_generated_error|no_update is not relevant
[...]
Actions #14

Updated by Nicolas PERRON about 11 years ago

  • Target version changed from 2.6.2 to 2.6.3
Actions #15

Updated by Nicolas CHARLES about 11 years ago

Oh, I thought I solved this issue before it was an issue by persisting the classes used by reporting. I must have missed something somehow

Actions #16

Updated by Nicolas CHARLES about 11 years ago

Ok, I guess I found the problem !
There is a lock, because the same reports are made both in failsafe and in update of regular promises; so on regular promises the reporting is not done, hence the no answer

Actions #17

Updated by Nicolas CHARLES about 11 years ago

  • Category set to Techniques
  • Assignee set to Nicolas CHARLES
  • Target version changed from 2.6.3 to 2.5.5
Actions #18

Updated by Nicolas CHARLES about 11 years ago

  • Status changed from Discussion to Pending technical review
  • Assignee changed from Nicolas CHARLES to Jonathan CLARKE
  • Pull Request set to https://github.com/Normation/rudder-techniques/pull/146
Actions #19

Updated by Nicolas CHARLES about 11 years ago

  • Target version changed from 2.5.5 to 2.4.7
  • Pull Request changed from https://github.com/Normation/rudder-techniques/pull/146 to https://github.com/Normation/rudder-techniques/pull/147
Actions #20

Updated by Nicolas CHARLES about 11 years ago

  • Status changed from Pending technical review to Pending release
  • % Done changed from 0 to 100

Applied in changeset commit:811d16d0ad28adf6fc51b6ff5dbfdfe762e290a1.

Actions #21

Updated by Jonathan CLARKE about 11 years ago

Applied in changeset commit:bbe2d1816cc356350bcf3631c51e91abc41d92a2.

Actions #22

Updated by Nicolas PERRON almost 11 years ago

  • Subject changed from "Common Policies > Update" shows No Answer status to The reporting of "Common Policies > Update" could be in a 'No Answer' status
Actions #23

Updated by Nicolas PERRON almost 11 years ago

  • Status changed from Pending release to Released
Actions #24

Updated by Nicolas PERRON almost 11 years ago

This bug has been fixed in Rudder 2.4.7, which was released today.
Check out:

Actions #25

Updated by Benoît PECCATTE over 9 years ago

  • Project changed from 24 to Rudder
  • Category changed from Techniques to Techniques
Actions

Also available in: Atom PDF