Actions
Bug #18227
closedFiber error when postgres is unavaible breaks generation forever
Pull Request:
Severity:
Critical - prevents main use of Rudder | no workaround | data loss | security
UX impact:
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Priority:
90
Name check:
To do
Fix check:
Checked
Regression:
Description
One user had to shutdown postgres after rudder was up.
It leads to errors like that:
[2020-09-18 10:18:57+0200] INFO compliance - [metrics] global compliance (number of components): 5118 [p:0 s:2164 r:7 e:3 u:0 m:0 nr:0 na:172 rd:0 c:2342 ana:208 nc:222 ae:0 bpm:0] [2020-09-18 10:19:09+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, [2020-09-18 10:19:09+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30003 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists. [2020-09-18 10:19:18+0200] WARN scheduled.job - Error when trying to clean log reports from report table. <- An error occured while deleting log reports <- HikariPool-1 - Connection is not available, request timed out after 30001ms. [2020-09-18 10:19:27+0200] ERROR report.cache - Error when updating compliance cache for nodes: [be8c60fb-2de5-44ad-8b7a-84acad91323d, 4d3d95a5-8e93-4021-9b98-e9daa97a5583, 31b367c9-3fd3-44b7-8a13-33a3b7a5c4fc, root, 7eabf174-21bc-4a16-8dc8-5aa168fa7316 -> [2020-09-18 10:19:31+0200] ERROR com.normation.rudder.services.ClearCacheServiceImpl - Error when logging cache event: SystemError: Error when persisting event log ClearCache; cause was: java.sql.SQLTransientConnectionException: HikariPool-1 - Connectio -> [2020-09-18 10:19:44+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, [2020-09-18 10:19:44+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30003 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists. [2020-09-18 10:19:50+0200] ERROR scheduled.job - could not fetch last id, don't log anything, wait next run, cause is Error when parsing property 'reportLoggerLastId' [2020-09-18 10:20:01+0200] INFO policy.generation - Start policy generation, checking updated rules [2020-09-18 10:20:02+0200] INFO policy.generation - [metrics] Xmx:1.94 GB nodes:15 (cached:0) rules:34 (enabled:31) techniques:84 (enabled:81) directives:144 (enabled:130) groups:32 (dynamic:29) parameters:3 [2020-09-18 10:20:19+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, [2020-09-18 10:20:19+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30000 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists. [2020-09-18 10:20:32+0200] ERROR historization - Error with update nodes historization process <- Could not update the nodes historization information in base. <- Fiber failed. A checked error was not handled. java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30004ms. at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:689) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161) at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) at doobie.util.transactor$Transactor$fromDataSource$FromDataSourceUnapplied.$anonfun$apply$14(transactor.scala:280)
1/ the connection pool should have seen that postgres was down, why timeout ?
2/ we need to take care of that stack trace and print only a relevant message to user
3/ postgres down must not make rudder broken foreever. When the connection get back, everything should be fine again.
Updated by François ARMAND over 4 years ago
- Translation missing: en.field_tag_list set to Sponsored
- Severity set to Critical - prevents main use of Rudder | no workaround | data loss | security
- User visibility set to Operational - other Techniques | Rudder settings | Plugins
- Priority changed from 0 to 108
Updated by François ARMAND over 4 years ago
- Status changed from New to In progress
Updated by François ARMAND over 4 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/3220
Updated by François ARMAND over 4 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|59922234553df2c9d13bc7ef4b469b461a1a163d.
Updated by François ARMAND over 4 years ago
- Fix check changed from To do to Checked
Updated by Vincent MEMBRÉ over 4 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 6.1.5 which was released today.
Actions