Architecture #18092


Improve compliance performance and reliability

Added by Nicolas CHARLES 9 months ago. Updated 8 months ago.

Web - Compliance & node report
Target version:
Effort required:


Compliance computation has evolved over time, improving through minimal changes and small big bangs.
It does work great for most uses cases, but it has some shortcomings:
  • we don't historize missing runs. If a node doesn't answer for 2 hours, we don't track that in database
  • when we catch up reports that haven't been sent in time, chaos in compliance follows
  • I finally succedded into having bottleneck in compliance computation on the load platform system (sending hundreds of inventories every 5 seconds), showing the sensitivity of postgresql to I/O for our usage.
  • there are a lot of back and forth between rudder and the database to list runs and save them, get the runs we just wrote and their nodeconfigid, getting the nodeconfigurations, and getting the reports for theses runs. Now that we have rudder-relayd that save runs in database, first part should be outsourced to rudder-relayd who knows exactly the runs it saves

This ticket is a meta ticket, and reporting improvement will happen in sub ticket, hopefully fixing all these issues

Subtasks 5 (5 open0 closed)

Architecture #18093: Simplify queries to detects runs in database, as all runs are completePending releaseElaad FURREEDANActions
Architecture #18910: Simpifying agent run request breaks testPending releaseNicolas CHARLESActions
Architecture #18255: Don't compute runs information, but use data from rudder-relaydPending releaseFran├žois ARMANDActions
Bug #19117: Remove call to executions.complete attribute which was removedPending releaseFran├žois ARMANDActions
Architecture #19151: add caching for NodeExpectedReportsIn progressNicolas CHARLESActions
Actions #1

Updated by Nicolas CHARLES 8 months ago

  • Target version changed from 6.2.0~beta1 to 7.0.0~alpha1

Also available in: Atom PDF