Project

General

Profile

Actions

Bug #25457

closed

One out of 1000 agent run is missed

Added by François ARMAND 15 days ago. Updated 1 day ago.

Status:
Resolved
Priority:
1 (highest)
Category:
Agent
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
To do
Regression:
No

Description

On user installation with lots of nodes, it was observed that around 0.1% of cfengine agent runs are missing.
It does not seem to be linked to special node OS, nor special hardware, nor special anything actually.

On that installation, the splaytime is 7min for an agent run frequency of 15 min.

It is on 7.3, but there is no reason it won't be the case in 8.1.

It also happens on a less big platform with only 1000 nodes (it just takes more care to see).


Related issues 1 (1 open0 closed)

Is duplicate of Rudder - Bug #25505: Backport scheduling fixes for cf-execdPending releaseFélix DALLIDETActions
Actions #1

Updated by François ARMAND 15 days ago

  • Priority changed from N/A to 1 (highest)
Actions #2

Updated by François ARMAND 8 days ago

First step would be to identify how we could reproduce it and check our correction.

At first look, it seems that it could be a "<" in place of a "=<" somewhere, perhaps if cf-execd wake up exactly when it should run the next run... But well, guessing game, so we need to be able to test our changes.

Actions #3

Updated by Nicolas CHARLES 8 days ago

On our systems, it happens only (only on linux systems)
The common points for skipped run is that they are all skipped with the exact same second for a given system (like all skipped runs on our monitoring node are right after a run starting at 00:00 (but it doesn't mean that all runs after a 00:00 will be skipped)

Actions #4

Updated by François ARMAND 1 day ago

  • Is duplicate of Bug #25505: Backport scheduling fixes for cf-execd added
Actions #5

Updated by François ARMAND 1 day ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF