Project

General

Profile

Actions

Bug #8363

closed

Bug #8051: Compliance is not correctly computed if we receive run agent right after generation

Some agent runs are not seen in the update process

Added by François ARMAND over 8 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
1 (highest)
Category:
Web - Compliance & node report
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

We have a process that looks for new execution reports, and from that update correspondeng agent runs, setting them as closed when needed.

But it seems that sometime, especially for long runs (>10s), the last report id is not well found back. More precissely, we get logs like that, with the inconsistencies highlighted with ** and @@:

[2016-05-24 00:00:14] INFO  com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - Get run last id Full(Some((1987663,2016-05-24T00:00:10.477+02:00)))
[2016-05-24 00:00:14] DEBUG report - [Store Agent Run Times #610] checking agent runs from SQL ID 1987663 [2016-05-24T00:00:10.477+02:00 - 2016-05-24T00:00:14.018+02:00]
[2016-05-24 00:00:14] TRACE com.normation.rudder.reports.execution.CachedReportsExecutionRepository - Update runs for nodes [root]
[2016-05-24 00:00:14] WARN  com.normation.rudder.reports.execution.CachedReportsExecutionRepository - Update agent run cache for nodes []
[2016-05-24 00:00:14] DEBUG report - [Store Agent Run Times #610] (283 ms) Added or updated 1 agent runs, up to SQL ID 1987665 (last run time was 2016-05-24T00:00:10.477+02:00)
[2016-05-24 00:00:14] INFO  com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - update run last id: **1987665**
[2016-05-24 00:00:14] DEBUG com.normation.rudder.services.reports.CachedReportingServiceImpl - Compliance cache: invalidate cache for nodes: []
[2016-05-24 00:00:14] WARN  report - Hooks execution time: 3 milliseconds

[2016-05-24 00:00:19] INFO  com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - Get run last id Full(Some((**1987677**,2016-05-24T00:00:10.477+02:00)))
[2016-05-24 00:00:19] DEBUG report - [Store Agent Run Times #611] checking agent runs from SQL ID 1987677 [2016-05-24T00:00:10.477+02:00 - 2016-05-24T00:00:19.306+02:00]
[2016-05-24 00:00:19] TRACE com.normation.rudder.reports.execution.CachedReportsExecutionRepository - Update runs for nodes [root]
[2016-05-24 00:00:19] WARN  com.normation.rudder.reports.execution.CachedReportsExecutionRepository - Update agent run cache for nodes []
[2016-05-24 00:00:19] DEBUG report - [Store Agent Run Times #611] (65 ms) Added or updated 1 agent runs, up to SQL ID 1987750 (last run time was 2016-05-24T00:00:10.477+02:00)
[2016-05-24 00:00:19] INFO  com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - update run last id: @@1987750@@
[2016-05-24 00:00:19] DEBUG com.normation.rudder.services.reports.CachedReportingServiceImpl - Compliance cache: invalidate cache for nodes: []
[2016-05-24 00:00:19] WARN  report - Hooks execution time: 3 milliseconds

[2016-05-24 00:00:19] DEBUG com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger - Writting non-compliant-report logs beetween ids 1987664 and 1987761 (both incuded)
[2016-05-24 00:00:19] DEBUG com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger - Wrote non-compliant-reports logs from id '1987664' to id '1987761'

[2016-05-24 00:00:24] INFO  com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - Get run last id Full(Some((@@1987763@@,2016-05-24T00:00:10.477+02:00)))
[2016-05-24 00:00:24] DEBUG report - [Store Agent Run Times #612] checking agent runs from SQL ID 1987763 [2016-05-24T00:00:10.477+02:00 - 2016-05-24T00:00:24.377+02:00]
[2016-05-24 00:00:24] DEBUG report - [Store Agent Run Times #612] (4 ms) Added or updated 0 agent runs
[2016-05-24 00:00:24] INFO  com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - update run last id: 1987763

[2016-05-24 00:00:29] INFO  com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - Get run last id Full(Some((1987763,2016-05-24T00:00:25.721+02:00)))
[2016-05-24 00:00:29] DEBUG report - [Store Agent Run Times #613] checking agent runs from SQL ID 1987763 [2016-05-24T00:00:25.721+02:00 - 2016-05-24T00:00:29.391+02:00]
[2016-05-24 00:00:29] DEBUG report - [Store Agent Run Times #613] (4 ms) Added or updated 0 agent runs
[2016-05-24 00:00:29] INFO  com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - update run last id: 1987763

So obviously, if we don't get all agent runs in the process in charge to invalidate runs cache and update compliance, that can't go well.

Actions #1

Updated by François ARMAND over 8 years ago

So, it seems that the problem is linked with squeryl [ourSession/ourTransaction]. Not sure why, I can't see what is causing the problem, but by replacing ourSession by ourTransaction in StatusUpdateSquerylRepository, I'm not able to see any missed run.

I would say that squeryl is completly broken, and we NEED to get ride of it.

For now, I propose to just alway use ourTransaction, even if there is a performance overhead - at least, it's correct.
Since the problem seems to be with outSession, I propose to remove it directly in 2.11, and remove all occurence of it.

Actions #2

Updated by François ARMAND over 8 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder/pull/1101
Actions #3

Updated by François ARMAND over 8 years ago

So, it appears that I had two Rudder webapp updating concurrently the DB, leading to the behaviour.

I believe we can close that bug.

Actions #4

Updated by Nicolas CHARLES over 8 years ago

  • Status changed from Pending technical review to Rejected

I'm rejecting this bug, as it was caused by two servers running on the same DB - and i'm not confident into changing this part of code in released version

Actions

Also available in: Atom PDF