Project

General

Profile

Actions

Bug #7290

closed

Inventory upload is not distributed uniformly

Added by Janos Mattyasovszky over 8 years ago. Updated about 2 years ago.

Status:
Released
Priority:
N/A
Category:
Web - Nodes & inventories
Target version:
Severity:
Minor - inconvenience | misleading | easy workaround
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Priority:
0
Name check:
Fix check:
Regression:

Description

Since the hammering was solved in a linked issue, I am modifying the ticket text to address the issue from the other point:

The uploads are not distributed randomly, there are very big spikes at the same times:

rudder:/var/rudder/inventories/accepted-nodes-updates # ll -ltr | gawk '/Dec  9/ {print $8}' | sort | uniq -c | sort -n -k1 -r | head -n50
    900 03:10
    886 00:05
    883 01:05
    715 04:10
    672 05:15
    491 02:10
    408 02:05
    150 05:16
    130 04:11
     88 04:15
     71 03:11
     62 05:10
     53 08:07
     51 00:00
     50 10:12
     42 07:07
     38 09:12
     33 06:02
     30 05:11
     24 06:15
     22 11:11
     22 10:11
     22 03:05
     21 11:16
     20 04:13
     20 01:08
     19 07:08
     19 03:13
     19 00:08
     15 01:06
     14 09:07
     14 02:08
     12 03:15
     11 11:12
     11 02:06
     11 01:10

Related issues 2 (0 open2 closed)

Related to Rudder - User story #7291: Exit sending inventory to rudder-webapp if the queue is fullReleasedJonathan CLARKE2016-04-12Actions
Related to Rudder - Bug #6718: If the agent schedule is not every 5 minutes, inventory may not be sent anymoreReleasedMatthieu CERDA2015-08-31Actions
Actions #1

Updated by Janos Mattyasovszky over 8 years ago

  • Description updated (diff)
Actions #2

Updated by Vincent MEMBRÉ over 8 years ago

  • Status changed from New to Discussion
  • Assignee set to Janos Mattyasovszky

Thanks Janos!

Do you think adding a property to set the max number of inventory to treat would help you out of this ?

Actions #3

Updated by Janos Mattyasovszky over 8 years ago

Well, i don't know :-)

Depends on if processing the Inventories is more CPU Bound, so would it scale with an increased number of vcpu-s, or is it more ldap-bound, or how would a system behave if you would have the proper setting to set it for example to 200...

What was the calculus on setting it to 50 in the first time? :)

Actions #4

Updated by Janos Mattyasovszky over 8 years ago

  • Tracker changed from Question to Bug
  • Assignee changed from Janos Mattyasovszky to Vincent MEMBRÉ
  • Found in version(s) old 3.0.9 added

It would make sense to process the the received inventories as soon they arrive, otherwise it makes no sense in splaying the upload, because the less the run interval of the agent is, the more you have to process at once.

The splay time is somehow not doing an even distribution, we have hotspots, which can cause an overload of the processing, and this is also made worse if you have a relatively low agent run interval (15-30 minutes)...

rootserver:/var/rudder/inventories/received # ll -ltr | gawk '/Dec  2/ {print $8}' | sort | uniq -c | sort -nrk 1 | head -n 30
    527 01:05
    500 03:10
    441 04:10
    372 00:05
    328 05:14
    299 02:10
    283 05:15
    168 02:05
    156 00:06
    120 03:11
     62 01:06
     60 00:04
     58 02:09
     57 04:14
     47 04:09
     44 04:15
     44 03:09
     34 02:04
     17 06:15
     11 00:00
      9 01:09
      8 05:10
      7 03:14
      7 00:09
      6 03:05
      5 08:14
      5 06:14
      5 03:15
      4 05:20
      4 05:19
Actions #5

Updated by Janos Mattyasovszky over 8 years ago

For example with inotifywait you could monitor the upload directories in real time for files, and pipe them to a loop, that processes it.

# inotifywait --quiet --monitor --exclude uuid.hive --format '%w%f' -e CLOSE_WRITE /var/rudder/inventories/incoming/ /var/rudder/inventories/accepted-nodes-updates/ | \
  while read INVFILE; do 
    /var/rudder/tools/send-clean.sh http://localhost:8080/endpoint/upload/ "$INVFILE" /var/rudder/inventories/received/ /var/rudder/inventories/failed/ 2> /dev/null;
  done

I know, this is pretty far from a bullet-proof daemon to process inventories, but I still think it should be de-coupled from the agent's run, as that will clog the inventory processing otherwise, and this basically gets worse when increasing the agent run time, and additionally if you use relays, as they also act as a delay, since they collect up a run's worth of inventories and push them in one batch to the upstream server.

Actions #6

Updated by Janos Mattyasovszky over 8 years ago

Janos Mattyasovszky wrote:

It would make sense to process the the received inventories as soon they arrive, otherwise it makes no sense in splaying the upload, because the less the run interval of the agent is, the more you have to process at once.

I meant: the higher the run interval is, the more you have to process at once.

Actions #7

Updated by Janos Mattyasovszky over 8 years ago

Just to have enough to read :-), today's upload times are again a little bit strage and oddly off:

rootserver:/var/rudder/inventories/received # ll -ltr | gawk '/Dec  4/ {print $8}' | sort | uniq -c | sort -nrk 1 | head -n 15
    728 03:10
    512 01:06
    385 00:06
    369 02:10
    366 04:10
    358 05:15
    321 05:16
    301 04:11
    204 00:05
    180 01:05
    164 02:05
    114 02:06
     92 00:01
     82 04:15
     64 03:11
Actions #8

Updated by Florian Heigl over 8 years ago

last entry is a non-perfect distribution but a lot better than the first time... I think we need to look at those numbers per day :/

Actions #9

Updated by Olivier Mauras over 8 years ago

I agree that this should be an option that can be modified in webapp config.
It certainly doesn't make sense to take that much time to process that number of inventories.

Actions #10

Updated by Janos Mattyasovszky about 8 years ago

  • Related to User story #7291: Exit sending inventory to rudder-webapp if the queue is full added
Actions #11

Updated by Jonathan CLARKE about 8 years ago

This setting can be changed in the config file /opt/rudder/etc/inventory-web.properties:

#
# Max number of reports waiting to be processed internally.
# For a rough estimation, you can consider that a report in queue
# takes 5 MB, so to handle 50 (default), the application will
# need around 250 MB of spare memory.
#
waiting.inventory.queue.size=50

You can increase it and observe the impact on memory for the endpoint. Please bear in mind that any inventories accepted by the endpoint, and stored in memory, will be lost if the endpoint is killed before they are processed (no reason this should happen, unless you kill it deliberately, the OS points the OOM killer at it or the machine shuts off).

Actions #12

Updated by Janos Mattyasovszky over 7 years ago

The problem of the bad splayclass for the inventory upload schedule is still present. I'd that not call evenly upload schedule:

rudder:/var/rudder/inventories/accepted-nodes-updates # ll -ltr | gawk '/Dec  9/ {print $8}' | sort | uniq -c | sort -n -k1 -r | head -n50
    900 03:10
    886 00:05
    883 01:05
    715 04:10
    672 05:15
    491 02:10
    408 02:05
    150 05:16
    130 04:11
     88 04:15
     71 03:11
     62 05:10
     53 08:07
     51 00:00
     50 10:12
     42 07:07
     38 09:12
     33 06:02
     30 05:11
     24 06:15
     22 11:11
     22 10:11
     22 03:05
     21 11:16
     20 04:13
     20 01:08
     19 07:08
     19 03:13
     19 00:08
     15 01:06
     14 09:07
     14 02:08
     12 03:15
     11 11:12
     11 02:06
     11 01:10
      8 01:09
      7 05:20
      7 00:10
      5 08:15
      5 06:07
      5 06:06
      5 02:11
      4 06:16
      4 04:14
      3 11:17
      3 09:11
      3 09:10
      3 08:06
      3 07:10
Actions #13

Updated by Janos Mattyasovszky over 7 years ago

  • Subject changed from Inventory processing to Inventory upload is not distributed uniformly
  • How to reproduce updated (diff)
  • Found in version(s) old 3.1.11 added
  • Found in version(s) old deleted (3.0.9)
Actions #14

Updated by Janos Mattyasovszky over 7 years ago

  • Description updated (diff)
Actions #15

Updated by Janos Mattyasovszky over 7 years ago

Could we create a custom technique which logs into a file according to differently seeded splayclasses?
This way we could identify which one is the mostly suited to randomize actions, so there are no peaks.

Actions #16

Updated by Benoît PECCATTE over 7 years ago

  • Found in version (s) 3.1.11 added
Actions #17

Updated by Benoît PECCATTE over 7 years ago

  • Found in version(s) old deleted (3.1.11)
Actions #18

Updated by Jonathan CLARKE about 7 years ago

  • Severity set to Minor - inconvenience | misleading | easy workaround
  • User visibility set to Operational - other Techniques | Technique editor | Rudder settings
  • Priority set to 14
Actions #19

Updated by Jonathan CLARKE about 7 years ago

  • Status changed from Discussion to New
Actions #20

Updated by Jonathan CLARKE about 7 years ago

  • Assignee deleted (Vincent MEMBRÉ)
Actions #21

Updated by Benoît PECCATTE almost 7 years ago

  • Priority changed from 14 to 27
Actions #22

Updated by Nicolas CHARLES about 5 years ago

  • Related to Bug #6718: If the agent schedule is not every 5 minutes, inventory may not be sent anymore added
Actions #23

Updated by Nicolas CHARLES about 5 years ago

  • Translation missing: en.field_tag_list set to Sponsored
  • Priority changed from 27 to 0
Actions #24

Updated by Nicolas CHARLES about 5 years ago

This patch was used with partial succes


+      "hex_integer"              string => string_tail("${sys.key_digest}", "1");
+      "inventory_time_quarter"   string => execresult("echo ${const.dollar}((0x${hex_integer} / 4))", "useshell");
+
+
   classes:
+      "need_Q1"                    expression => strcmp("${inventory_time_quarter}","0");
+      "need_Q2"                    expression => strcmp("${inventory_time_quarter}","1");
+      "need_Q3"                    expression => strcmp("${inventory_time_quarter}","2");
+      "need_Q4"                    expression => strcmp("${inventory_time_quarter}","3");
+      "quarter"                    expression => "(Q1.need_Q1)|(Q2.need_Q2)|(Q3.need_Q3)|(Q4.need_Q4)";
       "splaying"                   expression => splayclass("${sys.host}${sys.ipv4}","hourly");

       "inventory_run_selection" select_class => { "@{computeInventoryTime.inventory_time_selection}"};
@@ -49,7 +58,7 @@ bundle agent computeInventoryTime

        # Inventory will be during the night, at the hour selected, with a splay is this is the default schedule, else at the first run during the selected hour
        # if the interval is less than one hour, else at the first run of the night
-       "inventory_time" expression => "Night.((splaying.default_schedule.inventory_hour_selection)|(!default_schedule.less_than_one_hour_interval.inventory_hour_selection)|(!less_than_one_hour_interval))",
+       "inventory_time" expression => "Night.((splaying.default_schedule.inventory_hour_selection)|(!default_schedule.less_than_one_hour_interval.inventory_hour_selection.quarter)|(!less_than_one_hour_interval))",
                              scope => "namespace";

Actions #25

Updated by Nicolas CHARLES about 4 years ago

  • Target version set to 6.1.0~rc1
Actions #26

Updated by Nicolas CHARLES about 4 years ago

  • Status changed from New to In progress
  • Assignee set to Nicolas CHARLES
Actions #27

Updated by Nicolas CHARLES about 4 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Nicolas CHARLES to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-techniques/pull/1594
Actions #28

Updated by Nicolas CHARLES about 4 years ago

  • Status changed from Pending technical review to Pending release
Actions #29

Updated by Vincent MEMBRÉ about 4 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 6.1.0~rc1 which was released today.

Actions

Also available in: Atom PDF