Project

General

Profile

Actions

Bug #16773

closed

Batch of new nodes can overflow rudder server with inventories

Added by François ARMAND about 4 years ago. Updated over 3 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Priority:
45
Name check:
To do
Fix check:
Checked
Regression:

Description

We decided to make non accepted nodes send their inventories more often: https://issues.rudder.io/issues/9676 "An agent run with initial promises should send its inventory more often"

The unforseen effect of that decision is that if you had a bunch of nodes at the same time (in the hundreds), they start spamming Rudder server with inventories. And inventories will be rejected because the processing queue is full quite often.
If you are not lucky, it will always be the same node that will be processed.

We should add safeguards on the server side to reject inventories for new nodes that are already in the processing queue (and only new nodes, I believe).
We should also make nodes send their inventory more often only for one or two hours. Problems descibed in ticket #9676 don't matche the case of a node still not accepted after, say, 3 days.


Files

clipboard-202011132228-mpb2z.png (137 KB) clipboard-202011132228-mpb2z.png François ARMAND, 2020-11-13 22:28

Related issues 1 (0 open1 closed)

Related to Rudder - User story #9676: An agent run with initial promises should send its inventory more oftenReleasedNicolas CHARLESActions
Actions #1

Updated by François ARMAND about 4 years ago

  • Related to User story #9676: An agent run with initial promises should send its inventory more often added
Actions #2

Updated by François ARMAND about 4 years ago

  • Description updated (diff)
Actions #3

Updated by Vincent MEMBRÉ about 4 years ago

  • Target version changed from 5.0.17 to 5.0.18
Actions #4

Updated by François ARMAND almost 4 years ago

  • Category set to Performance and scalability
  • Target version changed from 5.0.18 to 6.2.0~beta1
  • Severity set to Major - prevents use of part of Rudder | no simple workaround
  • User visibility set to Operational - other Techniques | Rudder settings | Plugins
  • Priority changed from 0 to 24

This is a change perhaps to big for a patch version.

Actions #5

Updated by Vincent MEMBRÉ over 3 years ago

  • Target version changed from 6.2.0~beta1 to 6.2.0~rc1
  • Priority changed from 24 to 46
Actions #6

Updated by François ARMAND over 3 years ago

  • Assignee set to François ARMAND
  • Priority changed from 46 to 45
Actions #7

Updated by François ARMAND over 3 years ago

Actually, it seems like a bug and something that can be corrected in 6.1. It just misses a buffer.
(the AddToQL part can deduplicate inventories)

Actions #8

Updated by François ARMAND over 3 years ago

OK, so that's no so simple because of the fact that we return a "inventory status" to callers and that that status needs signature check, which means that we need to parse both signature file (ok, small) and inventory (not ok). The inventory parsing is need to get:

- nodeId (used to check certificate subject),
- certificate (for public key).

So, we can make all of that MUCH simpler, but it will be an API change:

- rest API only copy inventory / signature file to /var/inventories/received (no special treatment for it),
- same logic for inotify and periodic catch-up,
- only one big queue of the small InventoryFileInfo structure (can likely hold 10 000 elements for the cost of one parsed inventory)
- dequeue does the xml parsing, signature check, etc. (but here, see, returns Unit, nobody knows when it will happen).

For 6.1, we can try to add a buffer on the standard path (ie: not the rest API), and only there.

Actions #9

Updated by François ARMAND over 3 years ago

  • Status changed from New to In progress
Actions #10

Updated by François ARMAND over 3 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder/pull/3367
Actions #11

Updated by François ARMAND over 3 years ago

  • Status changed from Pending technical review to Pending release
Actions #12

Updated by François ARMAND over 3 years ago

  • Fix check changed from To do to Checked
Actions #13

Updated by Vincent MEMBRÉ over 3 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 6.1.7 which was released today.

Actions

Also available in: Atom PDF