Bug #25457
closedOne out of 1000 agent run is missed
Description
On user installation with lots of nodes, it was observed that around 0.1% of cfengine agent runs are missing.
It does not seem to be linked to special node OS, nor special hardware, nor special anything actually.
On that installation, the splaytime is 7min for an agent run frequency of 15 min.
It is on 7.3, but there is no reason it won't be the case in 8.1.
It also happens on a less big platform with only 1000 nodes (it just takes more care to see).
Updated by François ARMAND 2 months ago
- Priority changed from N/A to 1 (highest)
Updated by François ARMAND 2 months ago
First step would be to identify how we could reproduce it and check our correction.
At first look, it seems that it could be a "<" in place of a "=<" somewhere, perhaps if cf-execd wake up exactly when it should run the next run... But well, guessing game, so we need to be able to test our changes.
Updated by Nicolas CHARLES 2 months ago
On our systems, it happens only (only on linux systems)
The common points for skipped run is that they are all skipped with the exact same second for a given system (like all skipped runs on our monitoring node are right after a run starting at 00:00 (but it doesn't mean that all runs after a 00:00 will be skipped)
Updated by François ARMAND 2 months ago
- Is duplicate of Bug #25505: Backport scheduling fixes for cf-execd added