Bug #7290
closedInventory upload is not distributed uniformly
Description
Since the hammering was solved in a linked issue, I am modifying the ticket text to address the issue from the other point:
The uploads are not distributed randomly, there are very big spikes at the same times:
rudder:/var/rudder/inventories/accepted-nodes-updates # ll -ltr | gawk '/Dec 9/ {print $8}' | sort | uniq -c | sort -n -k1 -r | head -n50 900 03:10 886 00:05 883 01:05 715 04:10 672 05:15 491 02:10 408 02:05 150 05:16 130 04:11 88 04:15 71 03:11 62 05:10 53 08:07 51 00:00 50 10:12 42 07:07 38 09:12 33 06:02 30 05:11 24 06:15 22 11:11 22 10:11 22 03:05 21 11:16 20 04:13 20 01:08 19 07:08 19 03:13 19 00:08 15 01:06 14 09:07 14 02:08 12 03:15 11 11:12 11 02:06 11 01:10
Updated by Vincent MEMBRÉ about 9 years ago
- Status changed from New to Discussion
- Assignee set to Janos Mattyasovszky
Thanks Janos!
Do you think adding a property to set the max number of inventory to treat would help you out of this ?
Updated by Janos Mattyasovszky about 9 years ago
Well, i don't know :-)
Depends on if processing the Inventories is more CPU Bound, so would it scale with an increased number of vcpu-s, or is it more ldap-bound, or how would a system behave if you would have the proper setting to set it for example to 200...
What was the calculus on setting it to 50 in the first time? :)
Updated by Janos Mattyasovszky about 9 years ago
- Tracker changed from Question to Bug
- Assignee changed from Janos Mattyasovszky to Vincent MEMBRÉ
- Found in version(s) old 3.0.9 added
It would make sense to process the the received inventories as soon they arrive, otherwise it makes no sense in splaying the upload, because the less the run interval of the agent is, the more you have to process at once.
The splay time is somehow not doing an even distribution, we have hotspots, which can cause an overload of the processing, and this is also made worse if you have a relatively low agent run interval (15-30 minutes)...
rootserver:/var/rudder/inventories/received # ll -ltr | gawk '/Dec 2/ {print $8}' | sort | uniq -c | sort -nrk 1 | head -n 30 527 01:05 500 03:10 441 04:10 372 00:05 328 05:14 299 02:10 283 05:15 168 02:05 156 00:06 120 03:11 62 01:06 60 00:04 58 02:09 57 04:14 47 04:09 44 04:15 44 03:09 34 02:04 17 06:15 11 00:00 9 01:09 8 05:10 7 03:14 7 00:09 6 03:05 5 08:14 5 06:14 5 03:15 4 05:20 4 05:19
Updated by Janos Mattyasovszky about 9 years ago
For example with inotifywait you could monitor the upload directories in real time for files, and pipe them to a loop, that processes it.
# inotifywait --quiet --monitor --exclude uuid.hive --format '%w%f' -e CLOSE_WRITE /var/rudder/inventories/incoming/ /var/rudder/inventories/accepted-nodes-updates/ | \ while read INVFILE; do /var/rudder/tools/send-clean.sh http://localhost:8080/endpoint/upload/ "$INVFILE" /var/rudder/inventories/received/ /var/rudder/inventories/failed/ 2> /dev/null; done
I know, this is pretty far from a bullet-proof daemon to process inventories, but I still think it should be de-coupled from the agent's run, as that will clog the inventory processing otherwise, and this basically gets worse when increasing the agent run time, and additionally if you use relays, as they also act as a delay, since they collect up a run's worth of inventories and push them in one batch to the upstream server.
Updated by Janos Mattyasovszky about 9 years ago
Janos Mattyasovszky wrote:
It would make sense to process the the received inventories as soon they arrive, otherwise it makes no sense in splaying the upload, because the less the run interval of the agent is, the more you have to process at once.
I meant: the higher the run interval is, the more you have to process at once.
Updated by Janos Mattyasovszky about 9 years ago
Just to have enough to read :-), today's upload times are again a little bit strage and oddly off:
rootserver:/var/rudder/inventories/received # ll -ltr | gawk '/Dec 4/ {print $8}' | sort | uniq -c | sort -nrk 1 | head -n 15 728 03:10 512 01:06 385 00:06 369 02:10 366 04:10 358 05:15 321 05:16 301 04:11 204 00:05 180 01:05 164 02:05 114 02:06 92 00:01 82 04:15 64 03:11
Updated by Florian Heigl about 9 years ago
last entry is a non-perfect distribution but a lot better than the first time... I think we need to look at those numbers per day :/
Updated by Olivier Mauras almost 9 years ago
I agree that this should be an option that can be modified in webapp config.
It certainly doesn't make sense to take that much time to process that number of inventories.
Updated by Janos Mattyasovszky almost 9 years ago
- Related to User story #7291: Exit sending inventory to rudder-webapp if the queue is full added
Updated by Jonathan CLARKE almost 9 years ago
This setting can be changed in the config file /opt/rudder/etc/inventory-web.properties:
# # Max number of reports waiting to be processed internally. # For a rough estimation, you can consider that a report in queue # takes 5 MB, so to handle 50 (default), the application will # need around 250 MB of spare memory. # waiting.inventory.queue.size=50
You can increase it and observe the impact on memory for the endpoint. Please bear in mind that any inventories accepted by the endpoint, and stored in memory, will be lost if the endpoint is killed before they are processed (no reason this should happen, unless you kill it deliberately, the OS points the OOM killer at it or the machine shuts off).
Updated by Janos Mattyasovszky about 8 years ago
The problem of the bad splayclass for the inventory upload schedule is still present. I'd that not call evenly upload schedule:
rudder:/var/rudder/inventories/accepted-nodes-updates # ll -ltr | gawk '/Dec 9/ {print $8}' | sort | uniq -c | sort -n -k1 -r | head -n50 900 03:10 886 00:05 883 01:05 715 04:10 672 05:15 491 02:10 408 02:05 150 05:16 130 04:11 88 04:15 71 03:11 62 05:10 53 08:07 51 00:00 50 10:12 42 07:07 38 09:12 33 06:02 30 05:11 24 06:15 22 11:11 22 10:11 22 03:05 21 11:16 20 04:13 20 01:08 19 07:08 19 03:13 19 00:08 15 01:06 14 09:07 14 02:08 12 03:15 11 11:12 11 02:06 11 01:10 8 01:09 7 05:20 7 00:10 5 08:15 5 06:07 5 06:06 5 02:11 4 06:16 4 04:14 3 11:17 3 09:11 3 09:10 3 08:06 3 07:10
Updated by Janos Mattyasovszky about 8 years ago
- Subject changed from Inventory processing to Inventory upload is not distributed uniformly
- How to reproduce updated (diff)
- Found in version(s) old 3.1.11 added
- Found in version(s) old deleted (
3.0.9)
Updated by Janos Mattyasovszky about 8 years ago
Could we create a custom technique which logs into a file according to differently seeded splayclasses?
This way we could identify which one is the mostly suited to randomize actions, so there are no peaks.
Updated by Benoît PECCATTE almost 8 years ago
- Found in version (s) 3.1.11 added
Updated by Benoît PECCATTE almost 8 years ago
- Found in version(s) old deleted (
3.1.11)
Updated by Jonathan CLARKE almost 8 years ago
- Severity set to Minor - inconvenience | misleading | easy workaround
- User visibility set to Operational - other Techniques | Technique editor | Rudder settings
- Priority set to 14
Updated by Jonathan CLARKE over 7 years ago
- Status changed from Discussion to New
Updated by Nicolas CHARLES over 5 years ago
- Related to Bug #6718: If the agent schedule is not every 5 minutes, inventory may not be sent anymore added
Updated by Nicolas CHARLES over 5 years ago
- Translation missing: en.field_tag_list set to Sponsored
- Priority changed from 27 to 0
Updated by Nicolas CHARLES over 5 years ago
This patch was used with partial succes
+ "hex_integer" string => string_tail("${sys.key_digest}", "1"); + "inventory_time_quarter" string => execresult("echo ${const.dollar}((0x${hex_integer} / 4))", "useshell"); + + classes: + "need_Q1" expression => strcmp("${inventory_time_quarter}","0"); + "need_Q2" expression => strcmp("${inventory_time_quarter}","1"); + "need_Q3" expression => strcmp("${inventory_time_quarter}","2"); + "need_Q4" expression => strcmp("${inventory_time_quarter}","3"); + "quarter" expression => "(Q1.need_Q1)|(Q2.need_Q2)|(Q3.need_Q3)|(Q4.need_Q4)"; "splaying" expression => splayclass("${sys.host}${sys.ipv4}","hourly"); "inventory_run_selection" select_class => { "@{computeInventoryTime.inventory_time_selection}"}; @@ -49,7 +58,7 @@ bundle agent computeInventoryTime # Inventory will be during the night, at the hour selected, with a splay is this is the default schedule, else at the first run during the selected hour # if the interval is less than one hour, else at the first run of the night - "inventory_time" expression => "Night.((splaying.default_schedule.inventory_hour_selection)|(!default_schedule.less_than_one_hour_interval.inventory_hour_selection)|(!less_than_one_hour_interval))", + "inventory_time" expression => "Night.((splaying.default_schedule.inventory_hour_selection)|(!default_schedule.less_than_one_hour_interval.inventory_hour_selection.quarter)|(!less_than_one_hour_interval))", scope => "namespace";
Updated by Nicolas CHARLES over 4 years ago
- Status changed from New to In progress
- Assignee set to Nicolas CHARLES
Updated by Nicolas CHARLES over 4 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Nicolas CHARLES to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-techniques/pull/1594
Updated by Nicolas CHARLES over 4 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder-techniques|7bfc8b865a4a879e17e595dda0df635b8fd6550f.
Updated by Vincent MEMBRÉ over 4 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 6.1.0~rc1 which was released today.