Project

General

Profile

Actions

Bug #16612

closed

Bug #16557: CachedFindRuleNodeStatusReports is a huge source of contention

Bug #16565: New cache doesn't return all compliance

New compliance cache must return expired data

Added by François ARMAND over 4 years ago. Updated about 4 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
Reviewed
Fix check:
Checked
Regression:

Description

Today, we chose to remove expired compliance from the values returned by the cache. But the resulting behavior is highly strange and suprising when rudder is loaded and compliance take time to be computed.

Imagine reports are coming regularly, and compliances need continual update. It may happen (espcially after a full cache invalidate following a full generation) that not all cache are already updated after 5 minutes. In that case, all the node not yet updated will disappear from compliance queries, while in fact, they are most likely just in processing.

This leads to huge variation on node numbers, and it's scary and unworkable for user.

We propose to add the same latency as it exists in other part of rudder before declaring expiration (ie expiration+2 runs). During that period, cached data are returned. After that, we return a NoReportInInterval status (ie we keep the knowledge that we do have data in cache, but that they are not relevant anymore).
Once the compliance is finally calculated for the node, if really there was not run in interval, we will have that info in cache and will stop to ask for compliance update from the cache (next update will be caused by a run or policy generation)

Actions

Also available in: Atom PDF