Bug #25457
closed
One out of 1000 agent run is missed
Added by François ARMAND 2 months ago.
Updated 2 months ago.
Description
On user installation with lots of nodes, it was observed that around 0.1% of cfengine agent runs are missing.
It does not seem to be linked to special node OS, nor special hardware, nor special anything actually.
On that installation, the splaytime is 7min for an agent run frequency of 15 min.
It is on 7.3, but there is no reason it won't be the case in 8.1.
It also happens on a less big platform with only 1000 nodes (it just takes more care to see).
Related issues
1 (1 open — 0 closed)
- Priority changed from N/A to 1 (highest)
First step would be to identify how we could reproduce it and check our correction.
At first look, it seems that it could be a "<" in place of a "=<" somewhere, perhaps if cf-execd wake up exactly when it should run the next run... But well, guessing game, so we need to be able to test our changes.
On our systems, it happens only (only on linux systems)
The common points for skipped run is that they are all skipped with the exact same second for a given system (like all skipped runs on our monitoring node are right after a run starting at 00:00 (but it doesn't mean that all runs after a 00:00 will be skipped)
- Is duplicate of Bug #25505: Backport scheduling fixes for cf-execd added
- Status changed from New to Resolved
Also available in: Atom
PDF