Actions
Bug #13517
closedPolicy generation fails definitively when postgresql is shut down during policy generation
Status:
Released
Priority:
N/A
Assignee:
Category:
Performance and scalability
Target version:
Pull Request:
Severity:
Critical - prevents main use of Rudder | no workaround | data loss | security
UX impact:
User visibility:
Getting started - demo | first install | level 1 Techniques
Effort required:
Very Small
Priority:
122
Name check:
Fix check:
Regression:
Description
If Postgresql is shut down during a policy generation for more than 30 seconds, the policy generation fails, and cannot recover unless the webapp is restarted
To reproduce:- edit hook /opt/rudder/etc/hooks.d/policy-generation-node-ready/10-cf-promise-check and add a "sleep 10" there
- trigger a full regeneration
- shut down postgresql during the generation
- wait 30 seconds
You'll get the following stacktrace
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30011ms. at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146) at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:85) at doobie.util.transactor$DataSourceTransactor$DataSourceTransactorCtor$$anon$1.$anonfun$connect$2(transactor.scala:122) at scalaz.concurrent.Task$.$anonfun$delay$1(Task.scala:339) at scalaz.concurrent.Task$.$anonfun$suspend$2(Task.scala:347) at scalaz.concurrent.Task$.Try(Task.scala:457) at scalaz.concurrent.Task$.$anonfun$suspend$1(Task.scala:347) at scalaz.concurrent.Future.step(Future.scala:119) at scalaz.concurrent.Future.unsafePerformListen(Future.scala:75) at scalaz.concurrent.Future.unsafePerformAsync(Future.scala:159) at scalaz.concurrent.Future.unsafePerformSync(Future.scala:184) at scalaz.concurrent.Task.unsafePerformSync(Task.scala:99) at com.normation.rudder.repository.jdbc.RudderPropertiesRepositoryImpl.getReportLoggerLastId(RudderPropertiesRepository.scala:63) at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger$$anonfun$messageHandler$1.applyOrElse(AutomaticReportLogger.scala:105) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34) at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504) at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496) at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.execTranslate(AutomaticReportLogger.scala:88) at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:305) at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:227) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:241) at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:240) at com.normation.rudder.batch.AutomaticReportLogger$LAAutomaticReportLogger.around(AutomaticReportLogger.scala:88) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:227) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:190) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:262) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:52) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:216) at org.postgresql.Driver.makeConnection(Driver.java:404) at org.postgresql.Driver.connect(Driver.java:272) at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95) at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101) at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:346) at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:196) at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:442) at com.zaxxer.hikari.pool.HikariPool.access$200(HikariPool.java:73) at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:620) at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:606) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 common frames omitted Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.postgresql.core.PGStream.<init>(PGStream.java:61) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:144) ... 16 common frames omitted [2018-09-18 15:19:43] ERROR com.normation.rudder.batch.AsyncDeploymentAgent - Policy update error for process '7' at 2018-09-18 15:19:43: Exception caught during policy update process: HikariPool-1 - Connection is not available, request timed out after 30002ms. java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30002ms. at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146) at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:85) at doobie.util.transactor$DataSourceTransactor$DataSourceTransactorCtor$$anon$1.$anonfun$connect$2(transactor.scala:122) at scalaz.concurrent.Task$.$anonfun$delay$1(Task.scala:339) at scalaz.concurrent.Task$.$anonfun$suspend$2(Task.scala:347) at scalaz.concurrent.Task$.Try(Task.scala:457) at scalaz.concurrent.Task$.$anonfun$suspend$1(Task.scala:347) at scalaz.concurrent.Future.step(Future.scala:119) at scalaz.concurrent.Future.unsafePerformListen(Future.scala:75) at scalaz.concurrent.Future.unsafePerformAsync(Future.scala:159) at scalaz.concurrent.Future.unsafePerformSync(Future.scala:184) at scalaz.concurrent.Task.unsafePerformSync(Task.scala:99) at com.normation.rudder.repository.jdbc.UpdateExpectedReportsJdbcRepository.saveNodeExpectedReports(ExpectedReportsJdbcRepository.scala:231) at com.normation.rudder.services.policies.ExpectedReportsUpdateImpl.updateExpectedReports(ExpectedReportsUpdate.scala:205) at com.normation.rudder.services.policies.PromiseGeneration_setExpectedReports.setExpectedReports(DeploymentService.scala:1099) at com.normation.rudder.services.policies.PromiseGeneration_setExpectedReports.setExpectedReports$(DeploymentService.scala:1079) at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.setExpectedReports(DeploymentService.scala:478) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$75(DeploymentService.scala:235) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$71(DeploymentService.scala:228) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$65(DeploymentService.scala:217) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$43(DeploymentService.scala:184) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$38(DeploymentService.scala:170) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$36(DeploymentService.scala:169) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$34(DeploymentService.scala:168) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$33(DeploymentService.scala:167) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$31(DeploymentService.scala:166) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$28(DeploymentService.scala:163) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$24(DeploymentService.scala:160) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$20(DeploymentService.scala:157) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$16(DeploymentService.scala:154) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$12(DeploymentService.scala:151) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$8(DeploymentService.scala:148) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$5(DeploymentService.scala:143) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.$anonfun$deploy$2(DeploymentService.scala:138) at net.liftweb.common.Full.flatMap(Box.scala:808) at com.normation.rudder.services.policies.PromiseGenerationService.deploy(DeploymentService.scala:137) at com.normation.rudder.services.policies.PromiseGenerationService.deploy$(DeploymentService.scala:121) at com.normation.rudder.services.policies.PromiseGenerationServiceImpl.deploy(DeploymentService.scala:478) at com.normation.rudder.batch.AsyncDeploymentAgent$DeployerAgent$$anonfun$messageHandler$1.applyOrElse(AsyncDeploymentAgent.scala:350) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34) at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504) at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496) ... [2018-09-18 15:20:11] ERROR application - Error when executing [Store Agent Run Times] scheduler task started at 2018/09/18 15:19:41, ended at 2018/09/18 15:20:11. <- HikariPool-1 - Connection is not available, request timed out after 300 06ms. [2018-09-18 15:20:13] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30001ms. at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:591) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:194) at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146) at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:85) at doobie.util.transactor$DataSourceTransactor$DataSourceTransactorCtor$$anon$1.$anonfun$connect$2(transactor.scala:122) at scalaz.concurrent.Task$.$anonfun$delay$1(Task.scala:339) at scalaz.concurrent.Task$.$anonfun$suspend$2(Task.scala:347) at scalaz.concurrent.Task$.Try(Task.scala:457) at scalaz.concurrent.Task$.$anonfun$suspend$1(Task.scala:347) at scalaz.concurrent.Future.step(Future.scala:119) at scalaz.concurrent.Future.unsafePerformListen(Future.scala:75) at scalaz.concurrent.Future.unsafePerformAsync(Future.scala:159) at scalaz.concurrent.Future.unsafePerformSync(Future.scala:184) at scalaz.concurrent.Task.unsafePerformSync(Task.scala:99) at com.normation.rudder.repository.jdbc.EventLogJdbcRepository.saveEventLog(EventLogJdbcRepository.scala:88) at com.normation.rudder.batch.AsyncDeploymentAgent$$anonfun$lowPriority$1.applyOrElse(AsyncDeploymentAgent.scala:292) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34) at net.liftweb.http.ListenerManager$$anonfun$listenerService$1.applyOrElse(CometActor.scala:214) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168) at scala.collection.MapLike.$anonfun$applyOrElse$1(MapLike.scala:163) at scala.collection.MapLike.getOrElse(MapLike.scala:128) at scala.collection.MapLike.getOrElse$(MapLike.scala:126) at scala.collection.AbstractMap.getOrElse(Map.scala:59) at scala.collection.MapLike.applyOrElse(MapLike.scala:163) at scala.collection.MapLike.applyOrElse$(MapLike.scala:162) at scala.collection.AbstractMap.applyOrElse(Map.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168) at scala.collection.MapLike.$anonfun$applyOrElse$1(MapLike.scala:163) at scala.collection.MapLike.getOrElse(MapLike.scala:128) at scala.collection.MapLike.getOrElse$(MapLike.scala:126) at scala.collection.AbstractMap.getOrElse(Map.scala:59) at scala.collection.MapLike.applyOrElse(MapLike.scala:163) at scala.collection.MapLike.applyOrElse$(MapLike.scala:162) at scala.collection.AbstractMap.applyOrElse(Map.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:168) at net.liftweb.actor.LiftActor.execTranslate(LiftActor.scala:504) at net.liftweb.actor.LiftActor.execTranslate$(LiftActor.scala:496) at com.normation.rudder.batch.AsyncDeploymentAgent.execTranslate(AsyncDeploymentAgent.scala:100) at net.liftweb.actor.SpecializedLiftActor.proc2(LiftActor.scala:305) at net.liftweb.actor.SpecializedLiftActor.$anonfun$processMailbox$1(LiftActor.scala:227) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) at net.liftweb.actor.SpecializedLiftActor.around(LiftActor.scala:241) at net.liftweb.actor.SpecializedLiftActor.around$(LiftActor.scala:240) at com.normation.rudder.batch.AsyncDeploymentAgent.around(AsyncDeploymentAgent.scala:100) at net.liftweb.actor.SpecializedLiftActor.processMailbox(LiftActor.scala:227) at net.liftweb.actor.SpecializedLiftActor.$anonfun$$bang$3(LiftActor.scala:190) at net.liftweb.actor.LAScheduler$$anon$2$$anon$3.run(LiftActor.scala:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:262) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:52) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:216) at org.postgresql.Driver.makeConnection(Driver.java:404) at org.postgresql.Driver.connect(Driver.java:272) at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95) at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101) at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:346) at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:196) at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:442) at com.zaxxer.hikari.pool.HikariPool.access$200(HikariPool.java:73) at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:620) at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:606) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 common frames omitted Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.postgresql.core.PGStream.<init>(PGStream.java:61) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:144) ... 16 common frames omitted
Updated by François ARMAND about 6 years ago
- Severity set to Critical - prevents main use of Rudder | no workaround | data loss | security
- User visibility set to Getting started - demo | first install | level 1 Techniques
- Effort required set to Very Small
- Priority changed from 0 to 122
So, the problem is in `EventLogJdbcRepository.saveEventLog` and every other plance in the code where we use the code pattern:
sql"""....""".....attempt.transact(xa).unsafePerformSync
The problem here is that the "attempt" only catch errors in the SQL part, not when an error happens in the "transact" part.
We must use:
sql"""....""".....transact(xa).attempt.unsafePerformSync
Updated by François ARMAND about 6 years ago
- Status changed from New to In progress
- Assignee set to François ARMAND
Updated by François ARMAND about 6 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/2040
Updated by Rudder Quality Assistant about 6 years ago
- Assignee changed from Nicolas CHARLES to François ARMAND
Updated by François ARMAND about 6 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|76eb3db4229903792943967839a9fd4e21dea836.
Updated by François ARMAND about 6 years ago
- Related to Bug #13588: Correct attempt.transact -> transact.attempt in SQL request (see #13517) added
Updated by Vincent MEMBRÉ about 6 years ago
- Status changed from Pending release to Released
Actions