Project

General

Profile

Bug #14258

Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike

Added by Nicolas CHARLES 10 months ago. Updated 7 months ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Small
Priority:
97
Tags:

Description

check-rudder-agent is triggered by a cron job, every 5 minutes
When it happens on a server with 400 VMs, we a 400 instance of this script, running at the same time.

It is an issue because this script also runs cf-promises on the whole promises set, so it uses a bit of resource.

We ought to:
  1. have a splay on this script - could be a sleep of a random time (or deterministic time) based on actual defined spaytime
  2. don't cf-promises at each run - we should not do it more often than the agent frequency run, and it could be done much less often (once per day) as this is a failsafe after the failsafe
  3. (optionnaly) run this script at the agent run frequency, rather than every 5 minutes

Related issues

Related to Rudder - Bug #4768: check-rudder-agent should take splaytime into account when checking the last input update fileRejectedActions
Related to Rudder - Bug #11919: rudder agent check runs synchronously on all nodes, causing CPU spikesRejectedActions
Related to Rudder - Bug #14644: When installing rudder-agent, there's a long wait of run interval/2, so up to several hoursReleasedActions

Associated revisions

Revision a94650a7 (diff)
Added by Nicolas CHARLES 9 months ago

Fixes #14258: check-rudder-agent runs every 5 minutes exactly by cron, and can cause spike in resource usage

History

#1

Updated by Nicolas CHARLES 10 months ago

  • Related to Bug #4768: check-rudder-agent should take splaytime into account when checking the last input update file added
#2

Updated by François ARMAND 10 months ago

  • Tags set to Sponsored
  • Severity set to Major - prevents use of part of Rudder | no simple workaround
  • User visibility set to Operational - other Techniques | Rudder settings | Plugins
  • Priority changed from 0 to 84

Linked to support ticket S10748

#3

Updated by François ARMAND 10 months ago

  • Related to Bug #11919: rudder agent check runs synchronously on all nodes, causing CPU spikes added
#4

Updated by François ARMAND 9 months ago

  • Assignee set to Nicolas CHARLES

We "just" need to add a sleep at the begin of "rudder agent check" with the spay time lengh. We need to check if we are in interactive mode to avoid having the sleep in that mode. Or it could be a new parameter in the cron (like --cron).

#5

Updated by François ARMAND 9 months ago

  • Effort required set to Small
  • Priority changed from 84 to 100
#6

Updated by Nicolas CHARLES 9 months ago

best way to detect if interactive seems to be using if [ -t 0 ];

#7

Updated by Nicolas CHARLES 9 months ago

  • Status changed from New to In progress
#8

Updated by Nicolas CHARLES 9 months ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Nicolas CHARLES to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-agent/pull/205
#9

Updated by Rudder Quality Assistant 9 months ago

  • Assignee changed from Benoît PECCATTE to Nicolas CHARLES
  • Priority changed from 100 to 99
#10

Updated by Nicolas CHARLES 9 months ago

  • Status changed from Pending technical review to Pending release
#11

Updated by François ARMAND 9 months ago

  • Target version changed from 4.1.20 to 4.1.21
#12

Updated by Vincent MEMBRÉ 7 months ago

  • Subject changed from check-rudder-agent runs every 5 minutes exactly by cron, and can cause spike in resource usage to Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike
  • Priority changed from 99 to 98
#13

Updated by François ARMAND 7 months ago

  • Related to Bug #14644: When installing rudder-agent, there's a long wait of run interval/2, so up to several hours added
#14

Updated by Vincent MEMBRÉ 7 months ago

  • Status changed from Pending release to Released
  • Priority changed from 98 to 97

This bug has been fixed in Rudder 4.1.21, 4.3.11 and 5.0.9 which were released today.

Also available in: Atom PDF