Project

General

Profile

Bug #14258

Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike

Added by Nicolas CHARLES over 1 year ago. Updated about 1 year ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Small
Priority:
97
Tags:

Description

check-rudder-agent is triggered by a cron job, every 5 minutes
When it happens on a server with 400 VMs, we a 400 instance of this script, running at the same time.

It is an issue because this script also runs cf-promises on the whole promises set, so it uses a bit of resource.

We ought to:
  1. have a splay on this script - could be a sleep of a random time (or deterministic time) based on actual defined spaytime
  2. don't cf-promises at each run - we should not do it more often than the agent frequency run, and it could be done much less often (once per day) as this is a failsafe after the failsafe
  3. (optionnaly) run this script at the agent run frequency, rather than every 5 minutes

Related issues

Related to Rudder - Bug #4768: check-rudder-agent should take splaytime into account when checking the last input update fileRejectedActions
Related to Rudder - Bug #11919: rudder agent check runs synchronously on all nodes, causing CPU spikesRejectedActions
Related to Rudder - Bug #14644: When installing rudder-agent, there's a long wait of run interval/2, so up to several hoursReleasedNicolas CHARLESActions

Also available in: Atom PDF