Project

General

Profile

Actions

Bug #10881

closed

Almost nodes become in "red state" after upgrading from 4.0.2 to 4.1.3

Added by I C over 7 years ago. Updated about 7 years ago.

Status:
Rejected
Priority:
N/A
Assignee:
-
Category:
Web - Compliance & node report
Target version:
Severity:
Minor - inconvenience | misleading | easy workaround
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Priority:
32
Name check:
Fix check:
Regression:

Description

Hi,

After upgrading from rudder 4.0.2 to 4.1.3 almost 100% of my nodes become in "red state" in the dashboard and in the node list.

In node detail I have a "red status" with the following message :

"This node is sending reports from an unknown configuration policy (with configuration ID '20170303-131304-4da3122a' that is unknown to Rudder, run started at 2017-03-14 06:07:26)"

When next schedule run and nodes finish to check compliance everything come back to "green state" with 100% compliant.

Here is an output from database queries :

select * from ruddersysevents where nodeid = '019eb66d-XXXX-XXXX-XXXX-e7db93b174cd' and keyvalue = 'EndRun' order by executiontimestamp desc limit 10;

67081753 | 2017-06-08 18:02:14+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 18:01:55+02 | log_info | common | End execution with config [20170608-150958-1d4bcac]
67070378 | 2017-06-08 17:32:16+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 17:31:55+02 | log_info | common | End execution with config [20170608-150958-1d4bcac]
66967912 | 2017-06-08 12:02:49+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 12:02:25+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66844788 | 2017-06-08 06:02:22+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 06:01:58+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66721535 | 2017-06-08 00:01:58+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 00:01:31+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66606606 | 2017-06-07 18:02:27+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-07 18:02:04+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66484779 | 2017-06-07 12:01:57+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-07 12:01:37+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66368189 | 2017-06-07 06:02:33+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-07 06:02:10+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66245178 | 2017-06-07 00:02:10+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-07 00:01:43+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66130878 | 2017-06-06 18:02:40+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-06 18:02:16+02 | log_info | common | End execution with config [20170531-140238-63911ff5]

select nodeId,nodeconfigId,begindate,enddate from nodeconfigurations where nodeid='019eb66d-XXXX-XXXX-XXXX-e7db93b174cd' order by begindate desc limit 10;

019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | 20170608-150958-1d4bcac | 2017-06-08 15:09:58.124+02 |
019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | 20170608-150417-3f0f1fee | 2017-06-08 15:04:17.401+02 | 2017-06-08 15:09:58.124+02
019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | 20170531-140238-63911ff5 | 2017-05-31 14:02:38.567+02 | 2017-06-08 15:04:17.401+02


Related issues 2 (0 open2 closed)

Related to Rudder - Bug #10643: If node run interval is longer than 5 minutes, there may be "no report" at start of RudderRejectedActions
Is duplicate of Rudder - Bug #11037: Missing agent reports after Rudder server restartReleasedFrançois ARMANDActions
Actions #1

Updated by I C over 7 years ago

Another query for troubleshooting :

select nodeId,nodeconfigId,begindate,enddate from archivednodeconfigurations where nodeconfigid = '20170303-131304-4da3122a';
1d0197ac-977e-4a2c-b2a9-c59799613e8f | 20170303-131304-4da3122a | 2017-03-03 13:13:04.704+01 | 2017-03-14 09:59:14.863+01

nodeId return by the query is not the one expected (019eb66d-d6eb-4ef9-9cbf-e7db93b174cd)

Actions #2

Updated by François ARMAND over 7 years ago

OK, so it seems that:

- something startelled the compliance algo, which choose to look to some very old run
- the corresponding configuration was moved in parallel in archive table,

Several point to investigate:

- why a so old run was chosen ?
- why the compliance id is not for the corresponding node ?

Actions #3

Updated by François ARMAND over 7 years ago

  • Target version set to 4.1.4
  • Severity set to Minor - inconvenience | misleading | easy workaround
  • User visibility set to Operational - other Techniques | Technique editor | Rudder settings

I'm setting the version to 4.1 (because it happens for migration to 4.1), but perhaps the same logic is already in 4.0.
The severity is minor, because just regenerating policies made the compliance come back to a correct state.

Actions #4

Updated by François ARMAND over 7 years ago

  • Priority changed from 0 to 17
Actions #5

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 4.1.4 to 4.1.5
Actions #6

Updated by Alexis Mousset over 7 years ago

  • Target version changed from 4.1.5 to 4.1.6
Actions #7

Updated by Benoît PECCATTE over 7 years ago

  • Priority changed from 17 to 32
Actions #8

Updated by François ARMAND over 7 years ago

  • Related to Bug #11037: Missing agent reports after Rudder server restart added
Actions #9

Updated by Nicolas CHARLES over 7 years ago

  • Related to Bug #10643: If node run interval is longer than 5 minutes, there may be "no report" at start of Rudder added
Actions #10

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 4.1.6 to 4.1.7
Actions #11

Updated by François ARMAND about 7 years ago

  • Status changed from New to Rejected

So, now that we know what the problem was in #11037, we are (almost) sur that it is the same problem here.

I'm closing it as "duplicate" but if you see it happen again, please reopen.

Actions #12

Updated by François ARMAND about 7 years ago

  • Related to deleted (Bug #11037: Missing agent reports after Rudder server restart)
Actions #13

Updated by François ARMAND about 7 years ago

  • Is duplicate of Bug #11037: Missing agent reports after Rudder server restart added
Actions

Also available in: Atom PDF