Actions
Architecture #18092
openImprove compliance performance and reliability
Pull Request:
Effort required:
Name check:
To do
Fix check:
To do
Regression:
Description
Compliance computation has evolved over time, improving through minimal changes and small big bangs.
It does work great for most uses cases, but it has some shortcomings:
It does work great for most uses cases, but it has some shortcomings:
- we don't historize missing runs. If a node doesn't answer for 2 hours, we don't track that in database
- when we catch up reports that haven't been sent in time, chaos in compliance follows
- I finally succedded into having bottleneck in compliance computation on the load platform system (sending hundreds of inventories every 5 seconds), showing the sensitivity of postgresql to I/O for our usage.
- there are a lot of back and forth between rudder and the database to list runs and save them, get the runs we just wrote and their nodeconfigid, getting the nodeconfigurations, and getting the reports for theses runs. Now that we have rudder-relayd that save runs in database, first part should be outsourced to rudder-relayd who knows exactly the runs it saves
This ticket is a meta ticket, and reporting improvement will happen in sub ticket, hopefully fixing all these issues
Actions