Actions
Bug #13517
closedPolicy generation fails definitively when postgresql is shut down during policy generation
Bug #13517:
Policy generation fails definitively when postgresql is shut down during policy generation
Status:
Released
Priority:
N/A
Assignee:
Category:
Performance and scalability
Target version:
Pull Request:
Severity:
Critical - prevents main use of Rudder | no workaround | data loss | security
UX impact:
User visibility:
Getting started - demo | first install | level 1 Techniques
Effort required:
Very Small
Priority:
122
Name check:
Fix check:
Regression:
Description
If Postgresql is shut down during a policy generation for more than 30 seconds, the policy generation fails, and cannot recover unless the webapp is restarted
To reproduce:- edit hook /opt/rudder/etc/hooks.d/policy-generation-node-ready/10-cf-promise-check and add a "sleep 10" there
- trigger a full regeneration
- shut down postgresql during the generation
- wait 30 seconds
You'll get the following stacktrace
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30011ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:85)
at doobie.util.transactor$DataSourceTransactor$DataSourceTransactorCtor$$anon$1.$anonfun$connect$2(transactor.scala:122)
at scalaz.concurrent.Task$.$anonfun$delay$1(Task.scala:339)
at scalaz.concurrent.Task$.$anonfun$suspend$2(Task.scala:347)
at scalaz.concurrent.Task$.Try(Task.scala:457)
at scalaz.concurrent.Task$.$anonfun$suspend$1(Task.scala:347)
at scalaz.concurrent.Future.step(Future.scala:119)
at scalaz.concurrent.Future.unsafePerformListen(Future.scala:75)
at scalaz.concurrent.Future.unsafePerformAsync(Future.scala:159)
at scalaz.concurrent.Future.unsafePerformSync(Future.scala:184)
at scalaz.concurrent.Task.unsafePerformSync(Task.scala:99)
at com.normation.rudder.repository.jdbc.RudderPropertiesRepositoryImpl.getReportLoggerLastId(RudderPropertiesRepository.scala:63)
at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger$$anonfun$messageHandler$1.applyOrElse(AutomaticReportLogger.scala:105)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504)
at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496)
at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.execTranslate(AutomaticReportLogger.scala:88)
at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:305)
at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:227)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:241)
at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:240)
at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.around(AutomaticReportLogger.scala:88)
at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:227)
at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:190)
at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:262)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:52)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:216)
at org.postgresql.Driver.makeConnection(Driver.java:404)
at org.postgresql.Driver.connect(Driver.java:272)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:346)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:196)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:442)
at com.zaxxer.hikari.pool.HikariPool.access$200(HikariPool.java:73)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:620)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:606)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 common frames omitted
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.postgresql.core.PGStream.<init>(PGStream.java:61)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:144)
... 16 common frames omitted
[2018-09-18 15:19:43] ERROR com.normation.rudder.batch.AsyncDeploymentAgent - Policy update error for process '7' at 2018-09-18 15:19:43: Exception caught during policy update process: HikariPool-1 - Connection is not available, request timed out after 30002ms.
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30002ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:85)
at doobie.util.transactor$DataSourceTransactor$DataSourceTransactorCtor$$anon$1.$anonfun$connect$2(transactor.scala:122)
at scalaz.concurrent.Task$.$anonfun$delay$1(Task.scala:339)
at scalaz.concurrent.Task$.$anonfun$suspend$2(Task.scala:347)
at scalaz.concurrent.Task$.Try(Task.scala:457)
at scalaz.concurrent.Task$.$anonfun$suspend$1(Task.scala:347)
at scalaz.concurrent.Future.step(Future.scala:119)
at scalaz.concurrent.Future.unsafePerformListen(Future.scala:75)
at scalaz.concurrent.Future.unsafePerformAsync(Future.scala:159)
at scalaz.concurrent.Future.unsafePerformSync(Future.scala:184)
at scalaz.concurrent.Task.unsafePerformSync(Task.scala:99)
at com.normation.rudder.repository.jdbc.UpdateExpectedReportsJdbcRepository.saveNodeExpectedReports(ExpectedReportsJdbcRepository.scala:231)
at com.normation.rudder.services.policies.ExpectedReportsUpdateImpl.updateExpectedReports(ExpectedReportsUpdate.scala:205)
at com.normation.rudder.services.policies.PromiseGeneration_setExpectedReports.setExpectedReports(DeploymentService.scala:1099)
at com.normation.rudder.services.policies.PromiseGeneration_setExpectedReports.setExpectedReports$(DeploymentService.scala:1079)
at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.setExpectedReports(DeploymentService.scala:478)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$75(DeploymentService.scala:235)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$71(DeploymentService.scala:228)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$65(DeploymentService.scala:217)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$43(DeploymentService.scala:184)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$38(DeploymentService.scala:170)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$36(DeploymentService.scala:169)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$34(DeploymentService.scala:168)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$33(DeploymentService.scala:167)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$31(DeploymentService.scala:166)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$28(DeploymentService.scala:163)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$24(DeploymentService.scala:160)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$20(DeploymentService.scala:157)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$16(DeploymentService.scala:154)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$12(DeploymentService.scala:151)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$8(DeploymentService.scala:148)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$5(DeploymentService.scala:143)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$2(DeploymentService.scala:138)
at net.liftweb.common.Full.flatMap(Box.scala:808)
at com.normation.rudder.services.policies.PromiseGenerationService.deploy(DeploymentService.scala:137)
at com.normation.rudder.services.policies.PromiseGenerationService.deploy$(DeploymentService.scala:121)
at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.deploy(DeploymentService.scala:478)
at com.normation.rudder.batch.AsyncDeploymentAgent$DeployerAgent$$anonfun$messageHandler$1.applyOrElse(AsyncDeploymentAgent.scala:350)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504)
at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496)
...
[2018-09-18 15:20:11] ERROR application - Error when executing [Store Agent Run Times] scheduler task started at 2018/09/18 15:19:41, ended at 2018/09/18 15:20:11. <- HikariPool-1 - Connection is not available, request timed out after 300
06ms.
[2018-09-18 15:20:13] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30001ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:85)
at doobie.util.transactor$DataSourceTransactor$DataSourceTransactorCtor$$anon$1.$anonfun$connect$2(transactor.scala:122)
at scalaz.concurrent.Task$.$anonfun$delay$1(Task.scala:339)
at scalaz.concurrent.Task$.$anonfun$suspend$2(Task.scala:347)
at scalaz.concurrent.Task$.Try(Task.scala:457)
at scalaz.concurrent.Task$.$anonfun$suspend$1(Task.scala:347)
at scalaz.concurrent.Future.step(Future.scala:119)
at scalaz.concurrent.Future.unsafePerformListen(Future.scala:75)
at scalaz.concurrent.Future.unsafePerformAsync(Future.scala:159)
at scalaz.concurrent.Future.unsafePerformSync(Future.scala:184)
at scalaz.concurrent.Task.unsafePerformSync(Task.scala:99)
at com.normation.rudder.repository.jdbc.EventLogJdbcRepository.saveEventLog(EventLogJdbcRepository.scala:88)
at com.normation.rudder.batch.AsyncDeploymentAgent$$anonfun$lowPriority$1.applyOrElse(AsyncDeploymentAgent.scala:292)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at net.liftweb.http.ListenerManager$$anonfun$listenerService$1.applyOrElse(CometActor.scala:214)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
at scala.collection.MapLike.$anonfun$applyOrElse$1(MapLike.scala:163)
at scala.collection.MapLike.getOrElse(MapLike.scala:128)
at scala.collection.MapLike.getOrElse$(MapLike.scala:126)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at scala.collection.MapLike.applyOrElse(MapLike.scala:163)
at scala.collection.MapLike.applyOrElse$(MapLike.scala:162)
at scala.collection.AbstractMap.applyOrElse(Map.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
at scala.collection.MapLike.$anonfun$applyOrElse$1(MapLike.scala:163)
at scala.collection.MapLike.getOrElse(MapLike.scala:128)
at scala.collection.MapLike.getOrElse$(MapLike.scala:126)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at scala.collection.MapLike.applyOrElse(MapLike.scala:163)
at scala.collection.MapLike.applyOrElse$(MapLike.scala:162)
at scala.collection.AbstractMap.applyOrElse(Map.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168)
at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504)
at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496)
at com.normation.rudder.batch.AsyncDeploymentAgent.execTranslate(AsyncDeploymentAgent.scala:100)
at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:305)
at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:227)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:241)
at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:240)
at com.normation.rudder.batch.AsyncDeploymentAgent.around(AsyncDeploymentAgent.scala:100)
at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:227)
at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:190)
at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:262)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:52)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:216)
at org.postgresql.Driver.makeConnection(Driver.java:404)
at org.postgresql.Driver.connect(Driver.java:272)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:346)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:196)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:442)
at com.zaxxer.hikari.pool.HikariPool.access$200(HikariPool.java:73)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:620)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:606)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 common frames omitted
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.postgresql.core.PGStream.<init>(PGStream.java:61)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:144)
... 16 common frames omitted
Updated by François ARMAND over 7 years ago
- Severity set to Critical - prevents main use of Rudder | no workaround | data loss | security
- User visibility set to Getting started - demo | first install | level 1 Techniques
- Effort required set to Very Small
- Priority changed from 0 to 122
So, the problem is in `EventLogJdbcRepository.saveEventLog` and every other plance in the code where we use the code pattern:
sql"""....""".....attempt.transact(xa).unsafePerformSync
The problem here is that the "attempt" only catch errors in the SQL part, not when an error happens in the "transact" part.
We must use:
sql"""....""".....transact(xa).attempt.unsafePerformSync
Updated by François ARMAND over 7 years ago
- Status changed from New to In progress
- Assignee set to François ARMAND
Updated by François ARMAND over 7 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/2040
Updated by Rudder Quality Assistant over 7 years ago
- Assignee changed from Nicolas CHARLES to François ARMAND
Updated by François ARMAND over 7 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|76eb3db4229903792943967839a9fd4e21dea836.
Updated by François ARMAND over 7 years ago
- Related to Bug #13588: Correct attempt.transact -> transact.attempt in SQL request (see #13517) added
Updated by Vincent MEMBRÉ over 7 years ago
Actions