Bug #4497
closedRudder web UI freezes when too many inventory are received at the same time
Description
When testing the scalability of Rudder in the number of nodes axis, we demonstrated that the endpoint (the war in charge of parsing and saving inventories) may consume the whole memory allocated to Jetty web server, and so deeply impact the usability of Rudder UI (the second war), leading to moustrous response time, or even the complete stale of the web UI.
The reason is that the endpoint application processing of an incoming inventory is splitted in 3 parts:
- 1/ handling of the HTTP request. That part is responsible of validating that we have a correct HTTP request (Post, good url, posted document available, etc)
- 2/ checking that the posted document is actually an XML file that we can parse as a Fusion Inventory report, and which contains the requiered tags for Rudder (UUID, etc)
- 3/ actually saving the report in the correct status in our LDAP database (checking if it is already present, updating what needs to be, etc).
The part 1/ is handled by jetty, nothing much to say about it.
The part 2/ and 3/ are asynchrone, so that at the end of part 2/, we are already able to answer to the HTTP request ("ok, I'm processing your inventory" or "failed precondition" or other error status). So we have a queue used to communicate between step 2/ and 3/.
The problem is that 2/ is much quicker than 3/. So, parsed documents are accumulating in the queue, and a parsed XML may take quite a lot of memory (from some Mo to tens of them for big inventories).
At that point, we reach a classical JVM memory exauhstion, where the GC can't free sufficient memory compared to what is needed by for the next action, and so it spend more and more time trying to free memory when there is even less and less available.
If we send inventory less frequently (one every ten seconds in our tests) so that step 3/ can be completed before a new inventory arrive, we were able to sustain the rececption of hundreds of inventories without any impact on performances.
We suspect that that problem may be the root cause of several reported staling of Rudder web application, like #4425
Solution:
The first (easy) step is to have two seperated java web server (jetty or other) so that each of them does not impact the other (for reference, an endpoint need less than 256Mo of heap space to work correcly when the queue is bounded).
A second possibility (or step) is to bound the max number of queued inventories allowed in the endpoint.
A third possibility (evolution) is to transform what is today the "endpoint" web application into a deamon in charge of reading inventory files in the incoming directory and processing them (actually limiting the number of processed inventory to "one", or an other max number chosen knowing the concurrent possibilities of the machine).