Project

General

Profile

Bug #10881

Almost nodes become in "red state" after upgrading from 4.0.2 to 4.1.3

Added by Ilan COSTA about 3 years ago. Updated almost 3 years ago.

Status:
Rejected
Priority:
N/A
Assignee:
-
Category:
Web - Compliance & node report
Target version:
Pull Request:
Severity:
Minor - inconvenience | misleading | easy workaround
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Priority:
32

Description

Hi,

After upgrading from rudder 4.0.2 to 4.1.3 almost 100% of my nodes become in "red state" in the dashboard and in the node list.

In node detail I have a "red status" with the following message :

"This node is sending reports from an unknown configuration policy (with configuration ID '20170303-131304-4da3122a' that is unknown to Rudder, run started at 2017-03-14 06:07:26)"

When next schedule run and nodes finish to check compliance everything come back to "green state" with 100% compliant.

Here is an output from database queries :

select * from ruddersysevents where nodeid = '019eb66d-XXXX-XXXX-XXXX-e7db93b174cd' and keyvalue = 'EndRun' order by executiontimestamp desc limit 10;

67081753 | 2017-06-08 18:02:14+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 18:01:55+02 | log_info | common | End execution with config [20170608-150958-1d4bcac]
67070378 | 2017-06-08 17:32:16+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 17:31:55+02 | log_info | common | End execution with config [20170608-150958-1d4bcac]
66967912 | 2017-06-08 12:02:49+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 12:02:25+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66844788 | 2017-06-08 06:02:22+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 06:01:58+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66721535 | 2017-06-08 00:01:58+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-08 00:01:31+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66606606 | 2017-06-07 18:02:27+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-07 18:02:04+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66484779 | 2017-06-07 12:01:57+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-07 12:01:37+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66368189 | 2017-06-07 06:02:33+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-07 06:02:10+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66245178 | 2017-06-07 00:02:10+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-07 00:01:43+02 | log_info | common | End execution with config [20170531-140238-63911ff5]
66130878 | 2017-06-06 18:02:40+02 | 019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | common-root | hasPolicyServer-root | 70 | common | EndRun | 2017-06-06 18:02:16+02 | log_info | common | End execution with config [20170531-140238-63911ff5]

select nodeId,nodeconfigId,begindate,enddate from nodeconfigurations where nodeid='019eb66d-XXXX-XXXX-XXXX-e7db93b174cd' order by begindate desc limit 10;

019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | 20170608-150958-1d4bcac | 2017-06-08 15:09:58.124+02 |
019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | 20170608-150417-3f0f1fee | 2017-06-08 15:04:17.401+02 | 2017-06-08 15:09:58.124+02
019eb66d-XXXX-XXXX-XXXX-e7db93b174cd | 20170531-140238-63911ff5 | 2017-05-31 14:02:38.567+02 | 2017-06-08 15:04:17.401+02


Related issues

Related to Rudder - Bug #10643: If node run interval is longer than 5 minutes, there may be "no report" at start of RudderRejectedActions
Is duplicate of Rudder - Bug #11037: Missing agent reports after Rudder server restartReleasedFrançois ARMANDActions
#1

Updated by Ilan COSTA about 3 years ago

Another query for troubleshooting :

select nodeId,nodeconfigId,begindate,enddate from archivednodeconfigurations where nodeconfigid = '20170303-131304-4da3122a';
1d0197ac-977e-4a2c-b2a9-c59799613e8f | 20170303-131304-4da3122a | 2017-03-03 13:13:04.704+01 | 2017-03-14 09:59:14.863+01

nodeId return by the query is not the one expected (019eb66d-d6eb-4ef9-9cbf-e7db93b174cd)

#2

Updated by François ARMAND about 3 years ago

OK, so it seems that:

- something startelled the compliance algo, which choose to look to some very old run
- the corresponding configuration was moved in parallel in archive table,

Several point to investigate:

- why a so old run was chosen ?
- why the compliance id is not for the corresponding node ?

#3

Updated by François ARMAND about 3 years ago

  • Target version set to 4.1.4
  • Severity set to Minor - inconvenience | misleading | easy workaround
  • User visibility set to Operational - other Techniques | Technique editor | Rudder settings

I'm setting the version to 4.1 (because it happens for migration to 4.1), but perhaps the same logic is already in 4.0.
The severity is minor, because just regenerating policies made the compliance come back to a correct state.

#4

Updated by François ARMAND about 3 years ago

  • Priority changed from 0 to 17
#5

Updated by Vincent MEMBRÉ about 3 years ago

  • Target version changed from 4.1.4 to 4.1.5
#6

Updated by Alexis MOUSSET about 3 years ago

  • Target version changed from 4.1.5 to 4.1.6
#7

Updated by Benoît PECCATTE about 3 years ago

  • Priority changed from 17 to 32
#8

Updated by François ARMAND about 3 years ago

  • Related to Bug #11037: Missing agent reports after Rudder server restart added
#9

Updated by Nicolas CHARLES about 3 years ago

  • Related to Bug #10643: If node run interval is longer than 5 minutes, there may be "no report" at start of Rudder added
#10

Updated by Vincent MEMBRÉ about 3 years ago

  • Target version changed from 4.1.6 to 4.1.7
#11

Updated by François ARMAND almost 3 years ago

  • Status changed from New to Rejected

So, now that we know what the problem was in #11037, we are (almost) sur that it is the same problem here.

I'm closing it as "duplicate" but if you see it happen again, please reopen.

#12

Updated by François ARMAND almost 3 years ago

  • Related to deleted (Bug #11037: Missing agent reports after Rudder server restart)
#13

Updated by François ARMAND almost 3 years ago

  • Is duplicate of Bug #11037: Missing agent reports after Rudder server restart added

Also available in: Atom PDF