Bug #14258
closed
Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike
Added by Nicolas CHARLES almost 6 years ago.
Updated over 2 years ago.
Category:
Performance and scalability
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Description
check-rudder-agent is triggered by a cron job, every 5 minutes
When it happens on a server with 400 VMs, we a 400 instance of this script, running at the same time.
It is an issue because this script also runs cf-promises on the whole promises set, so it uses a bit of resource.
We ought to:
- have a splay on this script - could be a sleep of a random time (or deterministic time) based on actual defined spaytime
- don't cf-promises at each run - we should not do it more often than the agent frequency run, and it could be done much less often (once per day) as this is a failsafe after the failsafe
- (optionnaly) run this script at the agent run frequency, rather than every 5 minutes
- Related to Bug #4768: check-rudder-agent should take splaytime into account when checking the last input update file added
- Translation missing: en.field_tag_list set to Sponsored
- Severity set to Major - prevents use of part of Rudder | no simple workaround
- User visibility set to Operational - other Techniques | Rudder settings | Plugins
- Priority changed from 0 to 84
Linked to support ticket S10748
- Related to Bug #11919: rudder agent check runs synchronously on all nodes, causing CPU spikes added
- Assignee set to Nicolas CHARLES
We "just" need to add a sleep at the begin of "rudder agent check" with the spay time lengh. We need to check if we are in interactive mode to avoid having the sleep in that mode. Or it could be a new parameter in the cron (like --cron).
- Effort required set to Small
- Priority changed from 84 to 100
best way to detect if interactive seems to be using if [ -t 0 ];
- Status changed from New to In progress
- Status changed from In progress to Pending technical review
- Assignee changed from Nicolas CHARLES to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-agent/pull/205
- Assignee changed from Benoît PECCATTE to Nicolas CHARLES
- Priority changed from 100 to 99
- Status changed from Pending technical review to Pending release
- Target version changed from 4.1.20 to 4.1.21
- Subject changed from check-rudder-agent runs every 5 minutes exactly by cron, and can cause spike in resource usage to Cron job checking rudder agent health, is ran every 5 minutes exactly, causing resource usage spike
- Priority changed from 99 to 98
- Related to Bug #14644: When installing rudder-agent, there's a long wait of run interval/2, so up to several hours added
- Status changed from Pending release to Released
- Priority changed from 98 to 97
This bug has been fixed in Rudder 4.1.21, 4.3.11 and 5.0.9 which were released today.
- Priority changed from 97 to 86
Also available in: Atom
PDF