Bug #8363
closedBug #8051: Compliance is not correctly computed if we receive run agent right after generation
Some agent runs are not seen in the update process
Description
We have a process that looks for new execution reports, and from that update correspondeng agent runs, setting them as closed when needed.
But it seems that sometime, especially for long runs (>10s), the last report id is not well found back. More precissely, we get logs like that, with the inconsistencies highlighted with ** and @@:
[2016-05-24 00:00:14] INFO com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - Get run last id Full(Some((1987663,2016-05-24T00:00:10.477+02:00))) [2016-05-24 00:00:14] DEBUG report - [Store Agent Run Times #610] checking agent runs from SQL ID 1987663 [2016-05-24T00:00:10.477+02:00 - 2016-05-24T00:00:14.018+02:00] [2016-05-24 00:00:14] TRACE com.normation.rudder.reports.execution.CachedReportsExecutionRepository - Update runs for nodes [root] [2016-05-24 00:00:14] WARN com.normation.rudder.reports.execution.CachedReportsExecutionRepository - Update agent run cache for nodes [] [2016-05-24 00:00:14] DEBUG report - [Store Agent Run Times #610] (283 ms) Added or updated 1 agent runs, up to SQL ID 1987665 (last run time was 2016-05-24T00:00:10.477+02:00) [2016-05-24 00:00:14] INFO com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - update run last id: **1987665** [2016-05-24 00:00:14] DEBUG com.normation.rudder.services.reports.CachedReportingServiceImpl - Compliance cache: invalidate cache for nodes: [] [2016-05-24 00:00:14] WARN report - Hooks execution time: 3 milliseconds [2016-05-24 00:00:19] INFO com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - Get run last id Full(Some((**1987677**,2016-05-24T00:00:10.477+02:00))) [2016-05-24 00:00:19] DEBUG report - [Store Agent Run Times #611] checking agent runs from SQL ID 1987677 [2016-05-24T00:00:10.477+02:00 - 2016-05-24T00:00:19.306+02:00] [2016-05-24 00:00:19] TRACE com.normation.rudder.reports.execution.CachedReportsExecutionRepository - Update runs for nodes [root] [2016-05-24 00:00:19] WARN com.normation.rudder.reports.execution.CachedReportsExecutionRepository - Update agent run cache for nodes [] [2016-05-24 00:00:19] DEBUG report - [Store Agent Run Times #611] (65 ms) Added or updated 1 agent runs, up to SQL ID 1987750 (last run time was 2016-05-24T00:00:10.477+02:00) [2016-05-24 00:00:19] INFO com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - update run last id: @@1987750@@ [2016-05-24 00:00:19] DEBUG com.normation.rudder.services.reports.CachedReportingServiceImpl - Compliance cache: invalidate cache for nodes: [] [2016-05-24 00:00:19] WARN report - Hooks execution time: 3 milliseconds [2016-05-24 00:00:19] DEBUG com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger - Writting non-compliant-report logs beetween ids 1987664 and 1987761 (both incuded) [2016-05-24 00:00:19] DEBUG com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger - Wrote non-compliant-reports logs from id '1987664' to id '1987761' [2016-05-24 00:00:24] INFO com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - Get run last id Full(Some((@@1987763@@,2016-05-24T00:00:10.477+02:00))) [2016-05-24 00:00:24] DEBUG report - [Store Agent Run Times #612] checking agent runs from SQL ID 1987763 [2016-05-24T00:00:10.477+02:00 - 2016-05-24T00:00:24.377+02:00] [2016-05-24 00:00:24] DEBUG report - [Store Agent Run Times #612] (4 ms) Added or updated 0 agent runs [2016-05-24 00:00:24] INFO com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - update run last id: 1987763 [2016-05-24 00:00:29] INFO com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - Get run last id Full(Some((1987763,2016-05-24T00:00:25.721+02:00))) [2016-05-24 00:00:29] DEBUG report - [Store Agent Run Times #613] checking agent runs from SQL ID 1987763 [2016-05-24T00:00:25.721+02:00 - 2016-05-24T00:00:29.391+02:00] [2016-05-24 00:00:29] DEBUG report - [Store Agent Run Times #613] (4 ms) Added or updated 0 agent runs [2016-05-24 00:00:29] INFO com.normation.rudder.reports.statusUpdate.StatusUpdateSquerylRepository - update run last id: 1987763
So obviously, if we don't get all agent runs in the process in charge to invalidate runs cache and update compliance, that can't go well.
Updated by François ARMAND over 8 years ago
So, it seems that the problem is linked with squeryl [ourSession/ourTransaction]. Not sure why, I can't see what is causing the problem, but by replacing ourSession by ourTransaction in StatusUpdateSquerylRepository, I'm not able to see any missed run.
I would say that squeryl is completly broken, and we NEED to get ride of it.
For now, I propose to just alway use ourTransaction, even if there is a performance overhead - at least, it's correct.
Since the problem seems to be with outSession, I propose to remove it directly in 2.11, and remove all occurence of it.
Updated by François ARMAND over 8 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/1101
Updated by François ARMAND over 8 years ago
So, it appears that I had two Rudder webapp updating concurrently the DB, leading to the behaviour.
I believe we can close that bug.
Updated by Nicolas CHARLES over 8 years ago
- Status changed from Pending technical review to Rejected
I'm rejecting this bug, as it was caused by two servers running on the same DB - and i'm not confident into changing this part of code in released version