Project

General

Profile

Actions

Bug #14258

closed

Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike

Added by Nicolas CHARLES about 5 years ago. Updated almost 2 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Small
Priority:
86
Name check:
Fix check:
Regression:

Description

check-rudder-agent is triggered by a cron job, every 5 minutes
When it happens on a server with 400 VMs, we a 400 instance of this script, running at the same time.

It is an issue because this script also runs cf-promises on the whole promises set, so it uses a bit of resource.

We ought to:
  1. have a splay on this script - could be a sleep of a random time (or deterministic time) based on actual defined spaytime
  2. don't cf-promises at each run - we should not do it more often than the agent frequency run, and it could be done much less often (once per day) as this is a failsafe after the failsafe
  3. (optionnaly) run this script at the agent run frequency, rather than every 5 minutes

Related issues 3 (0 open3 closed)

Related to Rudder - Bug #4768: check-rudder-agent should take splaytime into account when checking the last input update fileRejectedActions
Related to Rudder - Bug #11919: rudder agent check runs synchronously on all nodes, causing CPU spikesRejectedActions
Related to Rudder - Bug #14644: When installing rudder-agent, there's a long wait of run interval/2, so up to several hoursReleasedNicolas CHARLESActions
Actions #1

Updated by Nicolas CHARLES about 5 years ago

  • Related to Bug #4768: check-rudder-agent should take splaytime into account when checking the last input update file added
Actions #2

Updated by François ARMAND about 5 years ago

  • Translation missing: en.field_tag_list set to Sponsored
  • Severity set to Major - prevents use of part of Rudder | no simple workaround
  • User visibility set to Operational - other Techniques | Rudder settings | Plugins
  • Priority changed from 0 to 84

Linked to support ticket S10748

Actions #3

Updated by François ARMAND about 5 years ago

  • Related to Bug #11919: rudder agent check runs synchronously on all nodes, causing CPU spikes added
Actions #4

Updated by François ARMAND about 5 years ago

  • Assignee set to Nicolas CHARLES

We "just" need to add a sleep at the begin of "rudder agent check" with the spay time lengh. We need to check if we are in interactive mode to avoid having the sleep in that mode. Or it could be a new parameter in the cron (like --cron).

Actions #5

Updated by François ARMAND about 5 years ago

  • Effort required set to Small
  • Priority changed from 84 to 100
Actions #6

Updated by Nicolas CHARLES about 5 years ago

best way to detect if interactive seems to be using if [ -t 0 ];

Actions #7

Updated by Nicolas CHARLES about 5 years ago

  • Status changed from New to In progress
Actions #8

Updated by Nicolas CHARLES about 5 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Nicolas CHARLES to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-agent/pull/205
Actions #9

Updated by Rudder Quality Assistant about 5 years ago

  • Assignee changed from Benoît PECCATTE to Nicolas CHARLES
  • Priority changed from 100 to 99
Actions #10

Updated by Nicolas CHARLES about 5 years ago

  • Status changed from Pending technical review to Pending release
Actions #11

Updated by François ARMAND about 5 years ago

  • Target version changed from 4.1.20 to 4.1.21
Actions #12

Updated by Vincent MEMBRÉ about 5 years ago

  • Subject changed from check-rudder-agent runs every 5 minutes exactly by cron, and can cause spike in resource usage to Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike
  • Priority changed from 99 to 98
Actions #13

Updated by François ARMAND about 5 years ago

  • Related to Bug #14644: When installing rudder-agent, there's a long wait of run interval/2, so up to several hours added
Actions #14

Updated by Vincent MEMBRÉ about 5 years ago

  • Status changed from Pending release to Released
  • Priority changed from 98 to 97

This bug has been fixed in Rudder 4.1.21, 4.3.11 and 5.0.9 which were released today.

Actions #15

Updated by Alexis Mousset almost 2 years ago

  • Priority changed from 97 to 86
Actions

Also available in: Atom PDF