Bug #15324
closedWebapp fails to boot (several time)
Description
When we install a new rudder server, it need to restart jetty 7 times before being successfull.
Here are the log:
Boot 1: suicide¶
First boot fails because LDAP is not ready and rudder kill itself. This seems legit.
[2019-07-26 09:49:23] ERROR scheduledJob - report logger could not fetch latest 100 non compliant reports, retry on next run: Could not get node information from database [2019-07-26 09:49:23] INFO bootchecks - LDAP and PostgreSQL connection are OK [2019-07-26 09:49:23] INFO bootchecks - Check mandatory DIT entries [2019-07-26 09:49:23] ERROR bootchecks - There is some required entries missing in the LDAP directory: ou=Pending Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Accepted Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Removed Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Rudder, cn=rudder-configuration [2019-07-26 09:49:23] ERROR application - Fatal error during boot, Rudder will stop now javax.servlet.UnavailableException: There is some required entries missing in the LDAP directory: ou=Pending Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Accepted Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Removed Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Rudder, cn=rudder-configuration at bootstrap.liftweb.checks.CheckDIT.FAIL$1(CheckDIT.scala:77) at bootstrap.liftweb.checks.CheckDIT.checks(CheckDIT.scala:106) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1(BootstrapChecks.scala:99) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1$adapted(BootstrapChecks.scala:91) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.checks(BootstrapChecks.scala:91) at bootstrap.liftweb.RudderConfig$.init(RudderConfig.scala:860) at bootstrap.liftweb.LiftInitContextListener.contextInitialized(LiftInitContextListener.scala:109) at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:953) at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:558) at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:918) at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:370) at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1497) at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1459)
Boot 2: stall¶
In second boot, rudder fails to start after 1 minute and systemd kills it:
Jul 26 09:49:33 server systemd[1]: Starting Jetty Web Application Server... ..... Jul 26 09:50:34 server rudder-jetty.sh[12515]: Starting Jetty: . . . . . . . . . . . . . . . FAILED Fri Jul 26 09:50:34 UTC 2019 Jul 26 09:50:34 server systemd[1]: rudder-jetty.service: Control process exited, code=exited status=1 Jul 26 09:50:34 server systemd[1]: Failed to start Jetty Web Application Server. Jul 26 09:50:34 server systemd[1]: rudder-jetty.service: Unit entered failed state. Jul 26 09:50:34 server systemd[1]: rudder-jetty.service: Failed with result 'exit-code'.
In webapp logs, the last lines are:
[2019-07-26 09:49:51] WARN scheduledJob - There is no reports to log [2019-07-26 09:49:51] WARN com.normation.rudder.repository.jdbc.RudderPropertiesRepositoryImpl - last id not present in database, create it with value 0
Boot 3 - stall¶
Same than previsous, but elsewhere. Rudder fails to start in 1 minutes:
Jul 26 09:50:35 server systemd[1]: Starting Jetty Web Application Server... ... Jul 26 09:51:35 server rudder-jetty.sh[12961]: Starting Jetty: . . . . . . . . . . . . . . . FAILED Fri Jul 26 09:51:35 UTC 2019 Jul 26 09:51:35 server systemd[1]: Failed to start Jetty Web Application Server.
But app seems to block elsewhere (we get at least one generation, and errors on bad inventories):
[2019-07-26 09:50:57] ERROR com.normation.rudder.services.policies.WriteNodeCertificatesPemImpl - SystemError: An error occured; cause was: /var/rudder/lib/ssl/allnodecerts.pem.new [2019-07-26 09:50:58] ERROR com.normation.rudder.domain.nodes.NodeInfo - Error when trying to get the CFEngine-MD5 digest of CFEngine public key for node 'server.rudder.local' (root) <- unable to decode base64 string: invalid characters encountered in base64 data [2019-07-26 09:50:58] ERROR com.normation.rudder.domain.nodes.NodeInfo - Error when trying to get the sha-256 digest of CFEngine public key for node 'server.rudder.local' (root) <- unable to decode base64 string: invalid characters encountered in base64 data ... [2019-07-26 09:51:01] INFO policy.generation - Successful policy update '1' [started 2019-07-26 09:50:56 - ended 2019-07-26 09:51:01]
Boot 4 - stall¶
Again, app killed after 1min
Jul 26 09:51:36 server systemd[1]: Starting Jetty Web Application Server... ... Jul 26 09:52:36 server rudder-jetty.sh[14178]: Starting Jetty: . . . . . . . . . . . . . . . FAILED Fri Jul 26 09:52:36 UTC 2019 Jul 26 09:52:36 server systemd[1]: rudder-jetty.service: Control process exited, code=exited status=1 Jul 26 09:52:37 server systemd[1]: Failed to start Jetty Web Application Server. Jul 26 09:52:37 server systemd[1]: rudder-jetty.service: Unit entered failed state.
The last log are:
[2019-07-26 09:51:53] INFO bootchecks - No full archive of configuration-repository items seems to have been done, initialising the system with one [2019-07-26 09:51:54] INFO migration - No migration detected by migration script (table 'migrationeventlog' is empty or does not exist) [2019-07-26 09:51:55] INFO com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl - Create a new archive: GitPath(archives/configurations-rules/2019-07-26_09-51-55.058)
Boot 5,6,7 - stalling again, again, and finally works¶
Here, we have 3 minutes before the next restart:
Jul 26 09:55:41 server systemd[1]: Starting Jetty Web Application Server... Jul 26 09:56:41 server rudder-jetty.sh[15847]: Starting Jetty: . . . . . . . . . . . . . . . FAILED Fri Jul 26 09:56:41 UTC 2019 Jul 26 09:56:41 server systemd[1]: rudder-jetty.service: Control process exited, code=exited status=1 Jul 26 09:56:42 server systemd[1]: Failed to start Jetty Web Application Server. Jul 26 09:56:42 server systemd[1]: rudder-jetty.service: Unit entered failed state. Jul 26 09:56:42 server systemd[1]: rudder-jetty.service: Failed with result 'exit-code'.
Things happen, but the last bootbecks is again about archive:
[2019-07-26 09:55:58] INFO bootchecks - No full archive of configuration-repository items seems to have been done, initialising the system with one
The next one is exactly the same.
And the last one finally succeed with the archive.
[2019-07-26 10:00:58] INFO bootchecks - No full archive of configuration-repository items seems to have been done, initialising the system with one [2019-07-26 10:00:58] DEBUG com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl - Archived Rule category: /var/rudder/configuration-repository/ruleCategories/category.xml [2019-07-26 10:00:59] INFO com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl - Create a new archive: GitPath(archives/configurations-rules/2019-07-26_10-00-59.360) [2019-07-26 10:00:59] DEBUG com.normation.rudder.repository.xml.GitRuleArchiverImpl - Archived rule: /var/rudder/configuration-repository/rules/32377fd7-02fd-43d0-aab7-28460a91347b.xml [2019-07-26 10:00:59] INFO com.normation.rudder.repository.xml.GitRuleArchiverImpl - Create a new archive: GitPath(archives/configurations-rules/2019-07-26_10-00-59.768) [2019-07-26 10:01:01] INFO com.normation.rudder.repository.xml.GitActiveTechniqueCategoryArchiverImpl - Create a new archive: GitPath(archives/directives/2019-07-26_10-01-01.400) ..... [2019-07-26 10:01:01] DEBUG com.normation.rudder.repository.xml.GitNodeGroupArchiverImpl - Archived node group category: /var/rudder/configuration-repository/groups/category.xml [2019-07-26 10:01:01] INFO com.normation.rudder.repository.xml.GitNodeGroupArchiverImpl - Create a new archive: GitPath(archives/groups/2019-07-26_10-01-01.660) [2019-07-26 10:01:01] DEBUG com.normation.rudder.repository.xml.GitParameterArchiverImpl - Archived parameter: /var/rudder/configuration-repository/parameters/rudder_file_edit_header.xml [2019-07-26 10:01:01] INFO com.normation.rudder.repository.xml.GitParameterArchiverImpl - Create a new archive: GitPath(archives/parameters/2019-07-26_10-01-01.794) [2019-07-26 10:01:02] INFO com.normation.rudder.repository.xml.ItemArchiveManagerImpl - Create a new archive: GitPath(archives/full/2019-07-26_10-01-02.108) [2019-07-26 10:01:02] INFO bootchecks - First full archive of configuration-repository items done ...
Updated by François ARMAND over 5 years ago
- Status changed from New to In progress
Updated by François ARMAND over 5 years ago
So, it seems that the blocking point is that boocheck:
[2019-07-31 13:40:21] INFO bootchecks - No full archive of configuration-repository items seems to have been done, initialising the system with one
Afterward, threads accumulate and one of these two seems to be faulty (actually the second should have been called by the first):
"zio-default-blocking-10-2118413714" #87 daemon prio=5 os_prio=0 tid=0x00007f4b341c3800 nid=0x2c11 in Object.wait() [0x00007f4b27b7b000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at zio.internal.OneShot.get(OneShot.scala:79) - locked <0x00000000c7b93c08> (a zio.internal.OneShot) at zio.Runtime.unsafeRunSync(Runtime.scala:70) at zio.Runtime.unsafeRunSync$(Runtime.scala:65) at com.normation.zio$ZioRuntime$$anon$2.unsafeRunSync(ZioCommons.scala:291) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:305) at com.normation.box$IOToBox.toBox(ZioCommons.scala:344) at bootstrap.liftweb.checks.CheckInitXmlExport.checks(CheckInitXmlExport.scala:72) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1(BootstrapChecks.scala:99) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1$adapted(BootstrapChecks.scala:91) at bootstrap.liftweb.SequentialImmediateBootStrapChecks$$Lambda$2109/108232476.apply(Unknown Source) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at bootstrap.liftweb.SequentialImmediateBootStrapChecks.checks(BootstrapChecks.scala:91) at bootstrap.liftweb.RudderConfig$.$anonfun$init$2(RudderConfig.scala:864) at bootstrap.liftweb.RudderConfig$$$Lambda$2069/546271589.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at zio.blocking.Blocking$Service.$anonfun$effectBlocking$5(Blocking.scala:133) at zio.blocking.Blocking$Service$$Lambda$427/800470253.apply(Unknown Source) at zio.internal.FiberContext.evaluateNow(FiberContext.scala:332) at zio.internal.FiberContext.$anonfun$evaluateLater$1(FiberContext.scala:592) at zio.internal.FiberContext$$Lambda$423/1775525066.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
"zio-default-blocking-16-2118413714" #309 daemon prio=5 os_prio=0 tid=0x00007f4b620e5000 nid=0x2cf9 in Object.wait() [0x00007f4b25469000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at zio.internal.OneShot.get(OneShot.scala:79) - locked <0x00000000c7b93380> (a zio.internal.OneShot) at zio.Runtime.unsafeRunSync(Runtime.scala:70) at zio.Runtime.unsafeRunSync$(Runtime.scala:65) at com.normation.zio$ZioRuntime$$anon$2.unsafeRunSync(ZioCommons.scala:291) at com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:305) at com.normation.rudder.repository.xml.GitArchiverUtils.getRootDirectory(GitArchiverUtils.scala:107) at com.normation.rudder.repository.xml.GitArchiverUtils.getRootDirectory$(GitArchiverUtils.scala:105) at com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl.getRootDirectory$lzycompute(GitRuleCategoryArchiver.scala:120) - locked <0x00000000c7099da0> (a com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl) at com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl.getRootDirectory(GitRuleCategoryArchiver.scala:120) at com.normation.rudder.repository.xml.ItemArchiveManagerImpl.$anonfun$exportRuleCategories$2(ItemArchiveManagerImpl.scala:140) at com.normation.rudder.repository.xml.ItemArchiveManagerImpl$$Lambda$2656/502585367.apply(Unknown Source) at zio.internal.FiberContext.evaluateNow(FiberContext.scala:698) at zio.internal.FiberContext.$anonfun$evaluateLater$1(FiberContext.scala:592) at zio.internal.FiberContext$$Lambda$423/1775525066.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Updated by François ARMAND over 5 years ago
Reproduced with:
object GitArchiveUtilMain { val GIT_ROOT = "/tmp/rudder-test/gitroot" def main(args: Array[String]): Unit = { println("Start") val util = new GitArchiverUtils { override def gitRepo: GitRepositoryProvider = new GitRepositoryProviderImpl(GIT_ROOT) override def gitRootDirectory: File = new File(GIT_ROOT) override def relativePath: String = "/" override def xmlPrettyPrinter: RudderPrettyPrinter = null override def encoding: String = null override def gitModificationRepository: GitModificationRepository = new GitModificationRepository { override def getCommits(modificationId: ModificationId): IOResult[Option[GitCommitId]] = None.succeed override def addCommit(commit: GitCommitId, modId: ModificationId): IOResult[DB.GitCommitJoin] = DB.GitCommitJoin(commit, modId).succeed } override def loggerName: String = "my-logger-logger" } println("util created") println("getRootDirectory: " + util.getRootDirectory.getAbsolutePath) // never reach that. println("End") } }
Updated by François ARMAND over 5 years ago
So, the problem is with the lazy val
for getRootDirectory
.
The lock on object may cause something strange with ZIO
Updated by François ARMAND over 5 years ago
So, minimozed here: https://gist.github.com/fanf/c7f627d7e49fc6c5e2c0df95e6687579
It most likely a bug in scala lazy val initialisation: https://docs.scala-lang.org/sips/improved-lazy-val-initialization.html
Updated by François ARMAND over 5 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/2355
Updated by Nicolas CHARLES over 5 years ago
- Assignee changed from Nicolas CHARLES to François ARMAND
Updated by François ARMAND over 5 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|10d843356b94cb62df6cd5922dd01c83112d29e5.
Updated by Vincent MEMBRÉ about 5 years ago
- Status changed from Pending release to Released
- Priority changed from 47 to 90
This bug has been fixed in Rudder 6.0.0~beta1 which was released today.