Bug #27264
openRandom error after node-to-relay is applied and other dynamic group and node accepted by API problems
Description
after doing a node-to-relay, my relay only has
we've seen at a customer a missing common rule applied on node behind relay
there is something odd going on
Files
Updated by Nicolas CHARLES 26 days ago
Forcing a full generation update solves the issue
Updated by Nicolas CHARLES 26 days ago
it seems that retriggering the inventory on the relay was what solved the issue
Updated by François ARMAND 13 days ago
- Assignee set to François ARMAND
- Priority changed from To review to 1 (highest)
Perhaps something to do with the order in which dynamic groups are computed, and if the relay is already seen as a policy server
Updated by François ARMAND 6 days ago
- Subject changed from Random error after node-to-relay is applied to Random error after node-to-relay is applied and other dynamic group and node accepted by API problems
Updated by François ARMAND 6 days ago
It also happens for new node created by API.
It might be because the node is accepted during a group computation, and that it belongs to some of the group (the one not already computed) but not the other (the ones already computed when the node is added to LDAP).
Updated by François ARMAND 5 days ago
· Edited
We were able to identify that sequence of events that lead to something similar on a node behind a relay (details below). In that case, the generation failed because a file wasn't find for the new node, but it could be in a different timing, it's an other group whose computing is not yet done, and that group lead to a silent missing "promises.cf".
- start generation "A"
2025-07-15 12:04:28+0200 INFO nodes - Update in node '(no relevant)' inventories main information detected: triggering dynamic group update and a policy generation
- Node with the missing
promises.cf
appears with API
2025-07-15 12:04:32+0200 INFO nodes - API request for creating nodes: [d4c955a2-74d8-4908-9c95-139ebba251b2 (accepted)] (snip)
- dynamic where being updated, some start to get it - see the appearance of the second node here:
2025-07-15 12:04:36+0200 INFO dynamic-group - Dynamic group f291e535-...: added node with id: [ 0b1c201d-0ac4-4ee8-86f8-77bb4c0b4bcc ], removed: nothing 2025-07-15 12:04:36+0200 INFO dynamic-group - Dynamic group 14e5ec74-...: added node with id: [ d4c955a2-74d8-4908-9c95-139ebba251b2 , 0b1c201d-0ac4-4ee8-86f8-77bb4c0b4bcc ], removed: nothing
- generation "A" snapshoted all data for the generation
2025-07-15 12:04:38+0200 INFO policy.generation - [metrics] Xmx:....
- node detected and asking for a generation
2025-07-15 12:04:43+0200 INFO nodes - Update in node 'd4c955a2-74d8-4908-9c95-139ebba251b2' inventories main information detected: triggering dynamic group update and a policy generation 2025-07-15 12:04:44+0200 INFO dynamic-group - Dynamic group all-nodes-with-cfengine-agent: added node with id: [ d4c955a2-74d8-4908-9c95-139ebba251b2 ], removed: nothing 2025-07-15 12:04:44+0200 INFO dynamic-group - Dynamic group hasPolicyServer-39ef812a-XXX: added node with id: [ d4c955a2-74d8-4908-9c95-139ebba251b2 ], removed: nothing (snip)
- End of generation A in error
2025-07-15 12:05:50+0200 INFO policy.generation.timing - Policy generation failed after: 1 min 28 s Can't stat file '/var/rudder/share/39ef812a-XXX/share/d4c955a2-74d8-4908-9c95-139ebba251b2/rules.new/cfengine-community/promises.cf' for parsing. (stat: No such file or directory)][stderr:] (for node(s)
- starts generation "B"
2025-07-15 12:05:50+0200 INFO policy.generation - Start policy generation, checking updated rules (snip)
- Generation "B" success
2025-07-15 12:07:09+0200 INFO policy.generation.manager - Successful policy update 'xxx' [started 2025-07-15 12:05:50 - ended 2025-07-15 12:07:09]
Updated by François ARMAND 5 days ago
- Status changed from In progress to Pending technical review
- Pull Request set to https://github.com/Normation/rudder/pull/6552