Architecture #17180
openCompliance incorrectly computed with old reports or runs not sent in node chronological order
Description
In 6.1, when the agent tries to send back old reports, on the same node configuration, the "technical logs" are ok, but the compliance bars are
showing the worst report for each expected report.
Meaning that if I had an error on a component and the report failed at that time, if it is repaired next run, and the reporting also is, the associated compliance bar will be red even if its current state is ok
UPDATE: in 7.0, we never process old reports (and this is enforced at the relayd level). So the bug explained here is "solved", but not in a totally satisfactory way.
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 6.1.0~beta1 to 6.1.0~beta2
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 6.1.0~beta2 to 6.1.0~beta3
Updated by François ARMAND over 4 years ago
- Status changed from New to In progress
- Assignee set to François ARMAND
Updated by François ARMAND over 4 years ago
- Related to Bug #6005: If a node had a date in the future, but returned to current time, the reporting will always be invalid (until it catches up with the future date) added
Updated by François ARMAND over 4 years ago
In the past, we choose to not rely on node date/time to decide what was the "last" run because they were too easy to mismanage and lead to bug like #6005 (a node with time in the futur, then time is corrected will have an incorrect status for as long as what was futur date is reached).
This problem didn't disapeared, so I don't know how to handle that correctly.
Updated by François ARMAND over 4 years ago
Some more details: today, we take compliance based on the last full run received on rudder, with last defined as "last inserted":
/** * Find the last execution of nodes, whatever is its state. * Last execution is defined as "the last executions that have been inserted in the database", * and do not rely on date (which can change too often) * The goal is to have reporting that does not depend on time, as node may have time in the future, or * past, or even change during their lifetime * So the last run are the last run inserted in the reports database * See ticket http://www.rudder-project.org/redmine/issues/6005 */
To change that core hypothesis on report processing, we need to consider the following cases (because time in distributed environement is complicated and clock are famously reliables):
- what happen when rudder server|relayd|node is in the future? In the past? Future then past? Past then future?
- how do we prevent a broken state to remains broken when new reports come?
- how to we compute "a reasonnable delay since we updated policy to get a response from node" when relayd can send historical reports ?
It seems that at least part of the answer is to:
- forbid reports in the future compared to rudder server (even if it breaks when it's rudder which is in the past, at least it get corrected once rudder is again correctly configured),
- still keep insertion time to be able to say "whatever the reports date, I know that that report is older than some reasonnable amount of time, I should have received a new one"
- find a definition of "last" (don't forget there was problems linked to the fact that there were a differences between start run / reports time / end run, and that there were cases where two runs were mixed up - even if that case should not happen anymore with relayd and full run sent).
Of course we won't have correct answers in all cases, especially when part of the system is in defect (bad clock). But we need to have a clear answer for all cases.
Also, for sending old report to have any usefulness, we need to be able to compute compliance for arbitrary dates in the past. That means that:
- we need to keep enought expected reports in the past,
- we need to be able to do the computation, which is not possible now (because it highly depends of the relative date of "last generation end" compared to "run received")
- we need to actually do something with these raports: process compliance, save it, do something with it. Today, nothing it done in that regard.
So for now, the best solution is perhaps to just disable sending old reports on relayd.
Updated by François ARMAND over 4 years ago
- Related to Bug #17349: Disable sending old reports from relayd added
Updated by François ARMAND over 4 years ago
- Subject changed from The webapp compliance does not work as intended when computing a run sending old agent logs to Compliance incorrectly computed when run are not sent in node chronological order (old reports)
- Target version changed from 6.1.0~beta3 to 6.2.0~beta1
Updated by François ARMAND over 4 years ago
- Subject changed from Compliance incorrectly computed when run are not sent in node chronological order (old reports) to Compliance incorrectly computed with old reports or runs not sent in node chronological order
- Status changed from In progress to New
Updated by François ARMAND over 4 years ago
- Related to Architecture #17921: improve searching in ruddersysevents for the reports in store run agents added
Updated by François ARMAND over 4 years ago
- Related to Bug #1466: Rudder should halt on fatal errors during initialization added
Updated by François ARMAND over 4 years ago
- Related to Bug #17181: Old runlog catchup by relayd breaks compliance computation added
Updated by François ARMAND over 4 years ago
- Severity set to Major - prevents use of part of Rudder | no simple workaround
- Priority changed from 0 to 20
Updated by Vincent MEMBRÉ about 4 years ago
- Target version changed from 6.2.0~beta1 to 6.2.0~rc1
- Priority changed from 20 to 38
Updated by Nicolas CHARLES about 4 years ago
- Target version changed from 6.2.0~rc1 to 7.0.0~beta1
- Priority changed from 38 to 19
cannot be fix in 6.x, need the ongoing change of 7.0
Updated by François ARMAND over 3 years ago
- Description updated (diff)
- Target version changed from 7.0.0~beta1 to 7.1.0~beta1
- Priority changed from 19 to 17
As of 7.0, the status is: we never process old reports (and this is enforced at the relayd level). So the bug explained here is "solved", but not in a totally satisfactory way.
Updated by François ARMAND almost 3 years ago
- Tracker changed from Bug to Architecture
- Severity deleted (
Major - prevents use of part of Rudder | no simple workaround) - User visibility deleted (
Infrequent - complex configurations | third party integrations) - Priority deleted (
17)
Updated by Vincent MEMBRÉ almost 3 years ago
- Target version changed from 7.1.0~beta1 to 7.1.0~beta2
Updated by Vincent MEMBRÉ almost 3 years ago
- Target version changed from 7.1.0~beta2 to 7.1.0~rc1
Updated by Alexis Mousset almost 3 years ago
- Target version changed from 7.1.0~rc1 to 7.2.0~beta1
Updated by Alexis Mousset over 2 years ago
- Target version changed from 7.2.0~beta1 to 7.3.0~beta1
Updated by Vincent MEMBRÉ almost 2 years ago
- Target version changed from 7.3.0~beta1 to 7.3.0~rc1
Updated by Vincent MEMBRÉ almost 2 years ago
- Target version changed from 7.3.0~rc1 to 7.3.0
Updated by Vincent MEMBRÉ almost 2 years ago
- Target version changed from 7.3.0 to 7.3.1
Updated by Vincent MEMBRÉ over 1 year ago
- Target version changed from 7.3.1 to 7.3.2
Updated by Vincent MEMBRÉ over 1 year ago
- Target version changed from 7.3.2 to 7.3.3
Updated by Vincent MEMBRÉ over 1 year ago
- Target version changed from 7.3.3 to 7.3.4
Updated by Vincent MEMBRÉ over 1 year ago
- Target version changed from 7.3.4 to 7.3.5
Updated by Alexis Mousset over 1 year ago
- Target version changed from 7.3.5 to 7.3.6
Updated by Vincent MEMBRÉ over 1 year ago
- Target version changed from 7.3.6 to 7.3.7
Updated by Vincent MEMBRÉ over 1 year ago
- Target version changed from 7.3.7 to 7.3.8
Updated by Vincent MEMBRÉ about 1 year ago
- Target version changed from 7.3.8 to 7.3.9
Updated by Vincent MEMBRÉ about 1 year ago
- Target version changed from 7.3.9 to 7.3.10
Updated by Vincent MEMBRÉ about 1 year ago
- Target version changed from 7.3.10 to 7.3.11
Updated by Vincent MEMBRÉ 12 months ago
- Target version changed from 7.3.11 to 7.3.12
Updated by Vincent MEMBRÉ 11 months ago
- Target version changed from 7.3.12 to 7.3.13
Updated by Vincent MEMBRÉ 11 months ago
- Target version changed from 7.3.13 to 7.3.14
Updated by Vincent MEMBRÉ 9 months ago
- Target version changed from 7.3.14 to 7.3.15
Updated by Vincent MEMBRÉ 8 months ago
- Target version changed from 7.3.15 to 7.3.16
Updated by Vincent MEMBRÉ 7 months ago
- Target version changed from 7.3.16 to 7.3.17