Project

General

Profile

Bug #14258

Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike

Added by Nicolas CHARLES over 1 year ago. Updated over 1 year ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Small
Priority:
97
Tags:

Description

check-rudder-agent is triggered by a cron job, every 5 minutes
When it happens on a server with 400 VMs, we a 400 instance of this script, running at the same time.

It is an issue because this script also runs cf-promises on the whole promises set, so it uses a bit of resource.

We ought to:
  1. have a splay on this script - could be a sleep of a random time (or deterministic time) based on actual defined spaytime
  2. don't cf-promises at each run - we should not do it more often than the agent frequency run, and it could be done much less often (once per day) as this is a failsafe after the failsafe
  3. (optionnaly) run this script at the agent run frequency, rather than every 5 minutes

Related issues

Related to Rudder - Bug #4768: check-rudder-agent should take splaytime into account when checking the last input update fileRejectedActions
Related to Rudder - Bug #11919: rudder agent check runs synchronously on all nodes, causing CPU spikesRejectedActions
Related to Rudder - Bug #14644: When installing rudder-agent, there's a long wait of run interval/2, so up to several hoursReleasedNicolas CHARLESActions
#1

Updated by Nicolas CHARLES over 1 year ago

  • Related to Bug #4768: check-rudder-agent should take splaytime into account when checking the last input update file added
#2

Updated by François ARMAND over 1 year ago

  • Tags set to Sponsored
  • Severity set to Major - prevents use of part of Rudder | no simple workaround
  • User visibility set to Operational - other Techniques | Rudder settings | Plugins
  • Priority changed from 0 to 84

Linked to support ticket S10748

#3

Updated by François ARMAND over 1 year ago

  • Related to Bug #11919: rudder agent check runs synchronously on all nodes, causing CPU spikes added
#4

Updated by François ARMAND over 1 year ago

  • Assignee set to Nicolas CHARLES

We "just" need to add a sleep at the begin of "rudder agent check" with the spay time lengh. We need to check if we are in interactive mode to avoid having the sleep in that mode. Or it could be a new parameter in the cron (like --cron).

#5

Updated by François ARMAND over 1 year ago

  • Effort required set to Small
  • Priority changed from 84 to 100
#6

Updated by Nicolas CHARLES over 1 year ago

best way to detect if interactive seems to be using if [ -t 0 ];

#7

Updated by Nicolas CHARLES over 1 year ago

  • Status changed from New to In progress
#8

Updated by Nicolas CHARLES over 1 year ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Nicolas CHARLES to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-agent/pull/205
#9

Updated by Rudder Quality Assistant over 1 year ago

  • Assignee changed from Benoît PECCATTE to Nicolas CHARLES
  • Priority changed from 100 to 99
#10

Updated by Nicolas CHARLES over 1 year ago

  • Status changed from Pending technical review to Pending release
#11

Updated by François ARMAND over 1 year ago

  • Target version changed from 4.1.20 to 4.1.21
#12

Updated by Vincent MEMBRÉ over 1 year ago

  • Subject changed from check-rudder-agent runs every 5 minutes exactly by cron, and can cause spike in resource usage to Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike
  • Priority changed from 99 to 98
#13

Updated by François ARMAND over 1 year ago

  • Related to Bug #14644: When installing rudder-agent, there's a long wait of run interval/2, so up to several hours added
#14

Updated by Vincent MEMBRÉ over 1 year ago

  • Status changed from Pending release to Released
  • Priority changed from 98 to 97

This bug has been fixed in Rudder 4.1.21, 4.3.11 and 5.0.9 which were released today.

Also available in: Atom PDF