Project

General

Profile

Actions

Bug #27264

open

Random error after node-to-relay is applied and other dynamic group and node accepted by API problems

Added by Nicolas CHARLES 26 days ago. Updated 5 days ago.

Status:
Pending technical review
Priority:
1 (highest)
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
To do
Regression:
No

Description

after doing a node-to-relay, my relay only has

we've seen at a customer a missing common rule applied on node behind relay
there is something odd going on


Files

clipboard-202507111019-jh9qi.png (35.8 KB) clipboard-202507111019-jh9qi.png Nicolas CHARLES, 2025-07-11 10:19
Actions #1

Updated by Nicolas CHARLES 26 days ago

Forcing a full generation update solves the issue

Actions #2

Updated by Nicolas CHARLES 26 days ago

it seems that retriggering the inventory on the relay was what solved the issue

Actions #3

Updated by François ARMAND 13 days ago

  • Assignee set to François ARMAND
  • Priority changed from To review to 1 (highest)

Perhaps something to do with the order in which dynamic groups are computed, and if the relay is already seen as a policy server

Actions #4

Updated by François ARMAND 6 days ago

  • Subject changed from Random error after node-to-relay is applied to Random error after node-to-relay is applied and other dynamic group and node accepted by API problems
Actions #5

Updated by François ARMAND 6 days ago

It also happens for new node created by API.

It might be because the node is accepted during a group computation, and that it belongs to some of the group (the one not already computed) but not the other (the ones already computed when the node is added to LDAP).

Actions #6

Updated by François ARMAND 5 days ago · Edited

We were able to identify that sequence of events that lead to something similar on a node behind a relay (details below). In that case, the generation failed because a file wasn't find for the new node, but it could be in a different timing, it's an other group whose computing is not yet done, and that group lead to a silent missing "promises.cf".

  • start generation "A"
    2025-07-15 12:04:28+0200 INFO  nodes - Update in node '(no relevant)' inventories main information detected: triggering dynamic group update and a policy generation
    
  • Node with the missing promises.cf appears with API
2025-07-15 12:04:32+0200 INFO  nodes - API request for creating nodes: [d4c955a2-74d8-4908-9c95-139ebba251b2 (accepted)]
(snip)
  • dynamic where being updated, some start to get it - see the appearance of the second node here:
    2025-07-15 12:04:36+0200 INFO  dynamic-group - Dynamic group f291e535-...: added node with id: [ 0b1c201d-0ac4-4ee8-86f8-77bb4c0b4bcc
     ], removed: nothing
    2025-07-15 12:04:36+0200 INFO  dynamic-group - Dynamic group 14e5ec74-...: added node with id: [ d4c955a2-74d8-4908-9c95-139ebba251b2
    , 0b1c201d-0ac4-4ee8-86f8-77bb4c0b4bcc ], removed: nothing
    
  • generation "A" snapshoted all data for the generation
    2025-07-15 12:04:38+0200 INFO  policy.generation - [metrics] Xmx:....
    
  • node detected and asking for a generation
2025-07-15 12:04:43+0200 INFO  nodes - Update in node 'd4c955a2-74d8-4908-9c95-139ebba251b2' inventories main information detected: triggering dynamic group update and a policy generation
2025-07-15 12:04:44+0200 INFO  dynamic-group - Dynamic group all-nodes-with-cfengine-agent: added node with id: [ d4c955a2-74d8-4908-9c95-139ebba251b2 ], removed: nothing
2025-07-15 12:04:44+0200 INFO  dynamic-group - Dynamic group hasPolicyServer-39ef812a-XXX: added node with id: [ d4c955a2-74d8-4908-9c95-139ebba251b2 ], removed: nothing
(snip)
  • End of generation A in error
2025-07-15 12:05:50+0200 INFO  policy.generation.timing - Policy generation failed after: 1 min 28 s

Can't stat file '/var/rudder/share/39ef812a-XXX/share/d4c955a2-74d8-4908-9c95-139ebba251b2/rules.new/cfengine-community/promises.cf' for parsing. (stat: No such file or directory)][stderr:] (for node(s) 
  • starts generation "B"
2025-07-15 12:05:50+0200 INFO  policy.generation - Start policy generation, checking updated rules
(snip)
  • Generation "B" success
2025-07-15 12:07:09+0200 INFO  policy.generation.manager - Successful policy update 'xxx' [started 2025-07-15 12:05:50 - ended 2025-07-15 12:07:09]
Actions #7

Updated by François ARMAND 5 days ago

  • Status changed from New to In progress
Actions #8

Updated by François ARMAND 5 days ago

  • Status changed from In progress to Pending technical review
  • Pull Request set to https://github.com/Normation/rudder/pull/6552
Actions

Also available in: Atom PDF