User story #7291
closedExit sending inventory to rudder-webapp if the queue is full
Description
(Related to #7290)
Apparently when the queue for the inventory-processing has reached it's limit of 50 concurrent, the service return an 503 and logs an error to the log.
However, the agent does not react on that, and tries to force all uploads to happen, hammering on the servce.
# cf-agent -KIC -b sendInventoryToCmdb | grep -w 503 2015-10-18T00:21:19+0200 info: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: curl: (22) The requested URL returned error: 503 2015-10-18T00:21:19+0200 error: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: Finished command related to promiser '/var/rudder/inventories/accepted-nodes-updates' -- an error occurred, returned 22 2015-10-18T00:21:19+0200 info: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: curl: (22) The requested URL returned error: 503 2015-10-18T00:21:19+0200 error: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: Finished command related to promiser '/var/rudder/inventories/accepted-nodes-updates' -- an error occurred, returned 22 2015-10-18T00:21:20+0200 info: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: curl: (22) The requested URL returned error: 503 2015-10-18T00:21:20+0200 error: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: Finished command related to promiser '/var/rudder/inventories/accepted-nodes-updates' -- an error occurred, returned 22 2015-10-18T00:21:20+0200 info: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: curl: (22) The requested URL returned error: 503 2015-10-18T00:21:20+0200 error: /default/sendInventoryToCmdb/files/'/var/rudder/inventories/accepted-nodes-updates'[0]: Finished command related to promiser '/var/rudder/inventories/accepted-nodes-updates' -- an error occurred, returned 22
(some lines were removed).
Could the first failure of this kind of error define a class that would skip the rest of the upload tries? When uploading 500, it's unnecessary to try to pump the remaining 450 inventories after the first 50 has saturated the queue.
Updated by Vincent MEMBRÉ about 9 years ago
- Assignee set to Matthieu CERDA
- Target version set to 2.10.20
Clearly that should be changed in the system technique:
Some idea:
If an inventory sending fails for full queue, raise a class "inventory_full_queue" and skip send inventories
Updated by Vincent MEMBRÉ about 9 years ago
- Target version changed from 2.10.20 to 277
Updated by Vincent MEMBRÉ about 9 years ago
- Target version changed from 277 to 2.11.18
Updated by Vincent MEMBRÉ about 9 years ago
- Target version changed from 2.11.18 to 2.11.19
Updated by Vincent MEMBRÉ almost 9 years ago
- Target version changed from 2.11.19 to 2.11.20
Updated by Jonathan CLARKE almost 9 years ago
- Translation missing: en.field_tag_list set to Quick and important
- Assignee changed from Matthieu CERDA to Alexis Mousset
Alexis, this should be a quick fix, could you look into it please?
Updated by Alexis Mousset almost 9 years ago
- Status changed from New to In progress
Updated by Alexis Mousset almost 9 years ago
- Status changed from In progress to New
It is not that easy since the classes are not evaluated when iterating over the selected files in the transformer.
Updated by Nicolas CHARLES almost 9 years ago
We should probably not use the transformer any more, but rather a script that would iterate over each files
Updated by Janos Mattyasovszky almost 9 years ago
On the short term it would also help to customize the queue size, I presume it scales with the number of cpus.
On long term it would make sense to make it part of the jetty to monitor the incoming folders via inotify and process it automatically without any scheduler required to run.
The story behind this issue is related to them fact, that the splayclass is somehow broken by the means that there are massive spikes in the upload distribution of the inventory, causing 300-500 inventories being forwarded at 12 past every hour (as shown and discussed with Jonathan). Will update the other ticket next week.
Updated by Janos Mattyasovszky almost 9 years ago
- Related to Bug #7290: Inventory upload is not distributed uniformly added
Updated by François ARMAND almost 9 years ago
In the long term (writting is under dev, but I need some couple of days to make it happens), the idea is to completly replace the cf-agent+curl+jetty+webapp by a simple java command line daemonizable, that will monitor files as they arrive and just keep add them, being 1 or 10000 inventories on the fs.
That has the side benefits to allow to run the cmd on a given file to help understand why an inventory has a problem, with much verbose log level.
Updated by Jonathan CLARKE almost 9 years ago
Alexis MOUSSET wrote:
It is not that easy since the classes are not evaluated when iterating over the selected files in the transformer.
My tests show the opposite. Run this file, for instance:
body common control { bundlesequence => { "test" }; } bundle agent test { vars: "names" slist => { "0", "1", "2", "3", "4", "5" }; files: # Initialise files "/tmp/${names}" create => "true", edit_defaults => empty, edit_line => content("/bin/true"); "/tmp/2" edit_defaults => empty, edit_line => content("/bin/false"); # Now all files contain "/bin/true" except "/tmp/2" which contains "/bin/false" "/tmp/${names}" ifvarclass => "!transformer_error", transformer => "/bin/bash ${this.promiser}", classes => result("transformer"); } body classes result(key) { repair_failed => { "${key}_error" }; } bundle edit_line content(line) { insert_lines: "${line}"; } body edit_defaults empty { empty_file_before_editing => "true"; }
On CFEngine 3.5.3, 3.6.5, 3.7.2, 3.8.1, this stops execution after the transformer hits /tmp/2. If you comment out the "ifvarclass" attribute, then it goes over all files.
Alexis, can you re-check please and implement this if you agree it works?
Updated by Alexis Mousset almost 9 years ago
- Status changed from New to Discussion
- Assignee changed from Alexis Mousset to Jonathan CLARKE
I think it works because you iterate over a CFEngine variable, whereas the inventory code uses a file_select. The following example does not stop after failure:
body common control { bundlesequence => { "test" }; } bundle agent test { vars: "names" slist => { "0", "1", "3", "4", "5" }; files: # Initialise files "/tmp/${names}.test" create => "true", edit_defaults => empty, edit_line => content("/bin/true"); "/tmp/2.test" edit_defaults => empty, edit_line => content("/bin/false"); # Now all files contain "/bin/true" except "/tmp/2" which contains "/bin/false" "/tmp" file_select => test_files, depth_search => recurse_visible(1), ifvarclass => "!transformer_error", transformer => "/bin/bash ${this.promiser}", classes => result("transformer"); } body depth_search recurse_visible(d) { depth => "${d}"; exclude_dirs => { "\..*" }; } body file_select test_files { leaf_name => { ".*.test" }; file_result => "leaf_name"; } body classes result(key) { repair_failed => { "${key}_error" }; } bundle edit_line content(line) { insert_lines: "${line}"; } body edit_defaults empty { empty_file_before_editing => "true"; }
info: Transforming '/bin/bash /tmp/2.test' error: Finished command related to promiser '/tmp' -- an error occurred, returned 1 error: Transformer '/tmp/2.test' => '/bin/bash /tmp/2.test' returned error info: Transforming '/bin/bash /tmp/5.test' info: Transformer '/tmp/5.test' => '/bin/bash /tmp/5.test' seemed to work ok info: Transforming '/bin/bash /tmp/4.test' info: Transformer '/tmp/4.test' => '/bin/bash /tmp/4.test' seemed to work ok info: Transforming '/bin/bash /tmp/3.test' info: Transformer '/tmp/3.test' => '/bin/bash /tmp/3.test' seemed to work ok info: Transforming '/bin/bash /tmp/1.test' info: Transformer '/tmp/1.test' => '/bin/bash /tmp/1.test' seemed to work ok info: Transforming '/bin/bash /tmp/0.test' info: Transformer '/tmp/0.test' => '/bin/bash /tmp/0.test' seemed to work ok
Updated by Jonathan CLARKE almost 9 years ago
- Assignee changed from Jonathan CLARKE to Alexis Mousset
Ah, interesting. Thanks for the further analysis.
According to https://github.com/Normation/rudder-techniques/blob/master/techniques/system/distributePolicy/1.0/propagatePromises.st#L193 and https://github.com/Normation/rudder-techniques/blob/master/techniques/system/common/1.0/site.st#L110, the file_select just gets all files that are *.ocs and *.ocs.gz.
Since this works when iterating over a CFEngine variable, let's try and get into the situation where this works. We could get the list of files in a CFEngine variable using the findfiles function (see https://docs.cfengine.com/docs/3.7/reference-functions-findfiles.html), like this:
"inventory_files" slist => findfiles("${g.rudder_inventories}/incoming/*.ocs", "${g.rudder_inventories}/incoming/*.ocs.gz");
Alexis, could you test this and propose a PR if it works please?
Updated by Janos Mattyasovszky almost 9 years ago
Hi
I presume the long-term goal of a separate daemon will not be implemented in already released versions, but this fix would be included also in one of the next (supported) versions (like 3.1)?
J
Updated by Jonathan CLARKE almost 9 years ago
Janos Mattyasovszky wrote:
I presume the long-term goal of a separate daemon will not be implemented in already released versions, but this fix would be included also in one of the next (supported) versions (like 3.1)?
Yes, exactly. That is our release policy - fix bugs in existing versions, and any major changes in new versions.
Updated by Jonathan CLARKE almost 9 years ago
- Related to Architecture #2630: Rudder Webapp and Rudder Inventory should be two different application added
Updated by Alexis Mousset almost 9 years ago
- Status changed from Discussion to Pending technical review
- Assignee changed from Alexis Mousset to Jonathan CLARKE
- Pull Request set to https://github.com/Normation/rudder-techniques/pull/905
Updated by Alexis Mousset almost 9 years ago
- Status changed from Pending technical review to Pending release
- % Done changed from 0 to 100
Applied in changeset rudder-techniques|6e459a2dbe0cc8dd8a6ba55968cb75e4ecfb6269.
Updated by Vincent MEMBRÉ almost 9 years ago
- Status changed from Pending release to Released