Project

General

Profile

Actions

User story #7291

closed

Exit sending inventory to rudder-webapp if the queue is full

Added by Janos Mattyasovszky over 5 years ago. Updated about 5 years ago.

Status:
Released
Priority:
N/A
Category:
Web - Nodes & inventories
Target version:
Suggestion strength:
User visibility:
Effort required:

Description

(Related to #7290)

Apparently when the queue for the inventory-processing has reached it's limit of 50 concurrent, the service return an 503 and logs an error to the log.

However, the agent does not react on that, and tries to force all uploads to happen, hammering on the servce.

# cf-agent -KIC -b sendInventoryToCmdb | grep -w 503
2015-10-18T00:21:19+0200     info: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: curl: (22) The requested URL returned error: 503
2015-10-18T00:21:19+0200    error: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: Finished command related to promiser '/var/rudder/inventories/accepted-nodes-updates' -- an error occurred, returned 22
2015-10-18T00:21:19+0200     info: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: curl: (22) The requested URL returned error: 503
2015-10-18T00:21:19+0200    error: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: Finished command related to promiser '/var/rudder/inventories/accepted-nodes-updates' -- an error occurred, returned 22
2015-10-18T00:21:20+0200     info: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: curl: (22) The requested URL returned error: 503
2015-10-18T00:21:20+0200    error: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: Finished command related to promiser '/var/rudder/inventories/accepted-nodes-updates' -- an error occurred, returned 22
2015-10-18T00:21:20+0200     info: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: curl: (22) The requested URL returned error: 503
2015-10-18T00:21:20+0200    error: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: Finished command related to promiser '/var/rudder/inventories/accepted-nodes-updates' -- an error occurred, returned 22

(some lines were removed).

Could the first failure of this kind of error define a class that would skip the rest of the upload tries? When uploading 500, it's unnecessary to try to pump the remaining 450 inventories after the first 50 has saturated the queue.


Subtasks 1 (0 open1 closed)

Bug #8170: Broken policies after 7291ReleasedNicolas CHARLES2016-04-12Actions

Related issues

Related to Rudder - Bug #7290: Inventory upload is not distributed uniformlyReleasedBenoît PECCATTEActions
Related to Rudder - Architecture #2630: Rudder Webapp and Rudder Inventory should be two different applicationRejectedActions
Actions #1

Updated by Vincent MEMBRÉ over 5 years ago

  • Assignee set to Matthieu CERDA
  • Target version set to 2.10.20

Clearly that should be changed in the system technique:

Some idea:

If an inventory sending fails for full queue, raise a class "inventory_full_queue" and skip send inventories

Actions #2

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 2.10.20 to 277
Actions #3

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 277 to 2.11.18
Actions #4

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 2.11.18 to 2.11.19
Actions #5

Updated by Vincent MEMBRÉ over 5 years ago

  • Target version changed from 2.11.19 to 2.11.20
Actions #6

Updated by Jonathan CLARKE over 5 years ago

  • Tags set to Quick and important
  • Assignee changed from Matthieu CERDA to Alexis MOUSSET

Alexis, this should be a quick fix, could you look into it please?

Actions #7

Updated by Alexis MOUSSET about 5 years ago

  • Status changed from New to In progress
Actions #8

Updated by Alexis MOUSSET about 5 years ago

  • Status changed from In progress to New

It is not that easy since the classes are not evaluated when iterating over the selected files in the transformer.

Actions #9

Updated by Nicolas CHARLES about 5 years ago

We should probably not use the transformer any more, but rather a script that would iterate over each files

Actions #10

Updated by Janos Mattyasovszky about 5 years ago

On the short term it would also help to customize the queue size, I presume it scales with the number of cpus.
On long term it would make sense to make it part of the jetty to monitor the incoming folders via inotify and process it automatically without any scheduler required to run.

The story behind this issue is related to them fact, that the splayclass is somehow broken by the means that there are massive spikes in the upload distribution of the inventory, causing 300-500 inventories being forwarded at 12 past every hour (as shown and discussed with Jonathan). Will update the other ticket next week.

Actions #11

Updated by Janos Mattyasovszky about 5 years ago

  • Related to Bug #7290: Inventory upload is not distributed uniformly added
Actions #12

Updated by François ARMAND about 5 years ago

In the long term (writting is under dev, but I need some couple of days to make it happens), the idea is to completly replace the cf-agent+curl+jetty+webapp by a simple java command line daemonizable, that will monitor files as they arrive and just keep add them, being 1 or 10000 inventories on the fs.

That has the side benefits to allow to run the cmd on a given file to help understand why an inventory has a problem, with much verbose log level.

Actions #13

Updated by Jonathan CLARKE about 5 years ago

Alexis MOUSSET wrote:

It is not that easy since the classes are not evaluated when iterating over the selected files in the transformer.

My tests show the opposite. Run this file, for instance:

body common control {
  bundlesequence => { "test" };
}

bundle agent test {

vars:
  "names" slist => { "0", "1", "2", "3", "4", "5" };

files:
  # Initialise files
  "/tmp/${names}" 
    create => "true",
    edit_defaults => empty,
    edit_line => content("/bin/true");

  "/tmp/2" 
    edit_defaults => empty,
    edit_line => content("/bin/false");

  # Now all files contain "/bin/true" except "/tmp/2" which contains "/bin/false" 

  "/tmp/${names}" 
    ifvarclass  => "!transformer_error",
    transformer => "/bin/bash ${this.promiser}",
    classes     => result("transformer");
}

body classes result(key) {
  repair_failed => { "${key}_error" };
}

bundle edit_line content(line) {
  insert_lines:
    "${line}";
}

body edit_defaults empty
{
  empty_file_before_editing => "true";
}

On CFEngine 3.5.3, 3.6.5, 3.7.2, 3.8.1, this stops execution after the transformer hits /tmp/2. If you comment out the "ifvarclass" attribute, then it goes over all files.

Alexis, can you re-check please and implement this if you agree it works?

Actions #14

Updated by Alexis MOUSSET about 5 years ago

  • Status changed from New to Discussion
  • Assignee changed from Alexis MOUSSET to Jonathan CLARKE

I think it works because you iterate over a CFEngine variable, whereas the inventory code uses a file_select. The following example does not stop after failure:

body common control {
  bundlesequence => { "test" };
}

bundle agent test {

vars:
  "names" slist => { "0", "1", "3", "4", "5" };

files:
  # Initialise files
  "/tmp/${names}.test" 
    create => "true",
    edit_defaults => empty,
    edit_line => content("/bin/true");

  "/tmp/2.test" 
    edit_defaults => empty,
    edit_line => content("/bin/false");

  # Now all files contain "/bin/true" except "/tmp/2" which contains "/bin/false" 

  "/tmp" 
    file_select => test_files,
    depth_search => recurse_visible(1),
    ifvarclass  => "!transformer_error",
    transformer => "/bin/bash ${this.promiser}",
    classes     => result("transformer");
}

body depth_search recurse_visible(d)
{
        depth        => "${d}";
        exclude_dirs => { "\..*" };
}

body file_select test_files
{
      leaf_name => { ".*.test" };
      file_result => "leaf_name";
}

body classes result(key) {
  repair_failed => { "${key}_error" };
}

bundle edit_line content(line) {
  insert_lines:
    "${line}";
}

body edit_defaults empty
{
  empty_file_before_editing => "true";
}
    info: Transforming '/bin/bash /tmp/2.test' 
   error: Finished command related to promiser '/tmp' -- an error occurred, returned 1
   error: Transformer '/tmp/2.test' => '/bin/bash /tmp/2.test' returned error
    info: Transforming '/bin/bash /tmp/5.test' 
    info: Transformer '/tmp/5.test' => '/bin/bash /tmp/5.test' seemed to work ok
    info: Transforming '/bin/bash /tmp/4.test' 
    info: Transformer '/tmp/4.test' => '/bin/bash /tmp/4.test' seemed to work ok
    info: Transforming '/bin/bash /tmp/3.test' 
    info: Transformer '/tmp/3.test' => '/bin/bash /tmp/3.test' seemed to work ok
    info: Transforming '/bin/bash /tmp/1.test' 
    info: Transformer '/tmp/1.test' => '/bin/bash /tmp/1.test' seemed to work ok
    info: Transforming '/bin/bash /tmp/0.test' 
    info: Transformer '/tmp/0.test' => '/bin/bash /tmp/0.test' seemed to work ok
Actions #15

Updated by Jonathan CLARKE about 5 years ago

  • Assignee changed from Jonathan CLARKE to Alexis MOUSSET

Ah, interesting. Thanks for the further analysis.

According to https://github.com/Normation/rudder-techniques/blob/master/techniques/system/distributePolicy/1.0/propagatePromises.st#L193 and https://github.com/Normation/rudder-techniques/blob/master/techniques/system/common/1.0/site.st#L110, the file_select just gets all files that are *.ocs and *.ocs.gz.

Since this works when iterating over a CFEngine variable, let's try and get into the situation where this works. We could get the list of files in a CFEngine variable using the findfiles function (see https://docs.cfengine.com/docs/3.7/reference-functions-findfiles.html), like this:

  "inventory_files" slist => findfiles("${g.rudder_inventories}/incoming/*.ocs", "${g.rudder_inventories}/incoming/*.ocs.gz");

Alexis, could you test this and propose a PR if it works please?

Actions #16

Updated by Janos Mattyasovszky about 5 years ago

Hi

I presume the long-term goal of a separate daemon will not be implemented in already released versions, but this fix would be included also in one of the next (supported) versions (like 3.1)?

J

Actions #17

Updated by Jonathan CLARKE about 5 years ago

Janos Mattyasovszky wrote:

I presume the long-term goal of a separate daemon will not be implemented in already released versions, but this fix would be included also in one of the next (supported) versions (like 3.1)?

Yes, exactly. That is our release policy - fix bugs in existing versions, and any major changes in new versions.

Actions #18

Updated by Jonathan CLARKE about 5 years ago

  • Related to Architecture #2630: Rudder Webapp and Rudder Inventory should be two different application added
Actions #19

Updated by Alexis MOUSSET about 5 years ago

  • Status changed from Discussion to Pending technical review
  • Assignee changed from Alexis MOUSSET to Jonathan CLARKE
  • Pull Request set to https://github.com/Normation/rudder-techniques/pull/905
Actions #20

Updated by Alexis MOUSSET about 5 years ago

  • Status changed from Pending technical review to Pending release
  • % Done changed from 0 to 100
Actions #21

Updated by Vincent MEMBRÉ about 5 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 2.11.20, 3.0.15, 3.1.9 and 3.2.2 which were released today.

Actions

Also available in: Atom PDF