Project

General

Profile

Actions

Bug #15324

closed

Webapp fails to boot (several time)

Added by François ARMAND over 5 years ago. Updated about 5 years ago.

Status:
Released
Priority:
N/A
Category:
System integration
Target version:
Severity:
Critical - prevents main use of Rudder | no workaround | data loss | security
UX impact:
User visibility:
Getting started - demo | first install | Technique editor and level 1 Techniques
Effort required:
Priority:
90
Name check:
To do
Fix check:
To do
Regression:

Description

When we install a new rudder server, it need to restart jetty 7 times before being successfull.

Here are the log:

Boot 1: suicide

First boot fails because LDAP is not ready and rudder kill itself. This seems legit.

[2019-07-26 09:49:23] ERROR scheduledJob - report logger could not fetch latest 100 non compliant reports, retry on next run: Could not get node information from database
[2019-07-26 09:49:23] INFO  bootchecks - LDAP and PostgreSQL connection are OK
[2019-07-26 09:49:23] INFO  bootchecks - Check mandatory DIT entries
[2019-07-26 09:49:23] ERROR bootchecks - There is some required entries missing in the LDAP directory: ou=Pending Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Accepted Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Removed Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Rudder, cn=rudder-configuration
[2019-07-26 09:49:23] ERROR application - Fatal error during boot, Rudder will stop now
javax.servlet.UnavailableException: There is some required entries missing in the LDAP directory: ou=Pending Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Accepted Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Removed Inventories, ou=Inventories, cn=rudder-configuration | ou=Inventories, cn=rudder-configuration | ou=Rudder, cn=rudder-configuration
        at bootstrap.liftweb.checks.CheckDIT.FAIL$1(CheckDIT.scala:77)
        at bootstrap.liftweb.checks.CheckDIT.checks(CheckDIT.scala:106)
        at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1(BootstrapChecks.scala:99)
        at bootstrap.liftweb.SequentialImmediateBootStrapChecks.$anonfun$checks$1$adapted(BootstrapChecks.scala:91)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at bootstrap.liftweb.SequentialImmediateBootStrapChecks.checks(BootstrapChecks.scala:91)
        at bootstrap.liftweb.RudderConfig$.init(RudderConfig.scala:860)
        at bootstrap.liftweb.LiftInitContextListener.contextInitialized(LiftInitContextListener.scala:109)
        at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:953)
        at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:558)
        at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:918)
        at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:370)
        at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1497)
        at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1459)

Boot 2: stall

In second boot, rudder fails to start after 1 minute and systemd kills it:

Jul 26 09:49:33 server systemd[1]: Starting Jetty Web Application Server...
.....
Jul 26 09:50:34 server rudder-jetty.sh[12515]: Starting Jetty: . . . . . . . . . . . . . . . FAILED Fri Jul 26 09:50:34 UTC 2019
Jul 26 09:50:34 server systemd[1]: rudder-jetty.service: Control process exited, code=exited status=1
Jul 26 09:50:34 server systemd[1]: Failed to start Jetty Web Application Server.
Jul 26 09:50:34 server systemd[1]: rudder-jetty.service: Unit entered failed state.
Jul 26 09:50:34 server systemd[1]: rudder-jetty.service: Failed with result 'exit-code'.

In webapp logs, the last lines are:

[2019-07-26 09:49:51] WARN  scheduledJob - There is no reports to log
[2019-07-26 09:49:51] WARN  com.normation.rudder.repository.jdbc.RudderPropertiesRepositoryImpl - last id not present in database, create it with value 0

Boot 3 - stall

Same than previsous, but elsewhere. Rudder fails to start in 1 minutes:

Jul 26 09:50:35 server systemd[1]: Starting Jetty Web Application Server...
...
Jul 26 09:51:35 server rudder-jetty.sh[12961]: Starting Jetty: . . . . . . . . . . . . . . . FAILED Fri Jul 26 09:51:35 UTC 2019
Jul 26 09:51:35 server systemd[1]: Failed to start Jetty Web Application Server.

But app seems to block elsewhere (we get at least one generation, and errors on bad inventories):

[2019-07-26 09:50:57] ERROR com.normation.rudder.services.policies.WriteNodeCertificatesPemImpl - SystemError: An error occured; cause was: /var/rudder/lib/ssl/allnodecerts.pem.new
[2019-07-26 09:50:58] ERROR com.normation.rudder.domain.nodes.NodeInfo - Error when trying to get the CFEngine-MD5 digest of CFEngine public key for node 'server.rudder.local' (root) <- unable to decode base64 string: invalid characters encountered in base64 data
[2019-07-26 09:50:58] ERROR com.normation.rudder.domain.nodes.NodeInfo - Error when trying to get the sha-256 digest of CFEngine public key for node 'server.rudder.local' (root) <- unable to decode base64 string: invalid characters encountered in base64 data
...
[2019-07-26 09:51:01] INFO  policy.generation - Successful policy update '1' [started 2019-07-26 09:50:56 - ended 2019-07-26 09:51:01]

Boot 4 - stall

Again, app killed after 1min

Jul 26 09:51:36 server systemd[1]: Starting Jetty Web Application Server...
...

Jul 26 09:52:36 server rudder-jetty.sh[14178]: Starting Jetty: . . . . . . . . . . . . . . . FAILED Fri Jul 26 09:52:36 UTC 2019
Jul 26 09:52:36 server systemd[1]: rudder-jetty.service: Control process exited, code=exited status=1
Jul 26 09:52:37 server systemd[1]: Failed to start Jetty Web Application Server.
Jul 26 09:52:37 server systemd[1]: rudder-jetty.service: Unit entered failed state.

The last log are:

[2019-07-26 09:51:53] INFO  bootchecks - No full archive of configuration-repository items seems to have been done, initialising the system with one
[2019-07-26 09:51:54] INFO  migration - No migration detected by migration script (table 'migrationeventlog' is empty or does not exist)
[2019-07-26 09:51:55] INFO  com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl - Create a new archive: GitPath(archives/configurations-rules/2019-07-26_09-51-55.058)

Boot 5,6,7 - stalling again, again, and finally works

Here, we have 3 minutes before the next restart:

Jul 26 09:55:41 server systemd[1]: Starting Jetty Web Application Server...

Jul 26 09:56:41 server rudder-jetty.sh[15847]: Starting Jetty: . . . . . . . . . . . . . . . FAILED Fri Jul 26 09:56:41 UTC 2019
Jul 26 09:56:41 server systemd[1]: rudder-jetty.service: Control process exited, code=exited status=1
Jul 26 09:56:42 server systemd[1]: Failed to start Jetty Web Application Server.
Jul 26 09:56:42 server systemd[1]: rudder-jetty.service: Unit entered failed state.
Jul 26 09:56:42 server systemd[1]: rudder-jetty.service: Failed with result 'exit-code'.

Things happen, but the last bootbecks is again about archive:

[2019-07-26 09:55:58] INFO  bootchecks - No full archive of configuration-repository items seems to have been done, initialising the system with one

The next one is exactly the same.

And the last one finally succeed with the archive.

[2019-07-26 10:00:58] INFO  bootchecks - No full archive of configuration-repository items seems to have been done, initialising the system with one
[2019-07-26 10:00:58] DEBUG com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl - Archived Rule category: /var/rudder/configuration-repository/ruleCategories/category.xml
[2019-07-26 10:00:59] INFO  com.normation.rudder.rule.category.GitRuleCategoryArchiverImpl - Create a new archive: GitPath(archives/configurations-rules/2019-07-26_10-00-59.360)
[2019-07-26 10:00:59] DEBUG com.normation.rudder.repository.xml.GitRuleArchiverImpl - Archived rule: /var/rudder/configuration-repository/rules/32377fd7-02fd-43d0-aab7-28460a91347b.xml
[2019-07-26 10:00:59] INFO  com.normation.rudder.repository.xml.GitRuleArchiverImpl - Create a new archive: GitPath(archives/configurations-rules/2019-07-26_10-00-59.768)
[2019-07-26 10:01:01] INFO  com.normation.rudder.repository.xml.GitActiveTechniqueCategoryArchiverImpl - Create a new archive: GitPath(archives/directives/2019-07-26_10-01-01.400)
.....
[2019-07-26 10:01:01] DEBUG com.normation.rudder.repository.xml.GitNodeGroupArchiverImpl - Archived node group category: /var/rudder/configuration-repository/groups/category.xml
[2019-07-26 10:01:01] INFO  com.normation.rudder.repository.xml.GitNodeGroupArchiverImpl - Create a new archive: GitPath(archives/groups/2019-07-26_10-01-01.660)
[2019-07-26 10:01:01] DEBUG com.normation.rudder.repository.xml.GitParameterArchiverImpl - Archived parameter: /var/rudder/configuration-repository/parameters/rudder_file_edit_header.xml
[2019-07-26 10:01:01] INFO  com.normation.rudder.repository.xml.GitParameterArchiverImpl - Create a new archive: GitPath(archives/parameters/2019-07-26_10-01-01.794)
[2019-07-26 10:01:02] INFO  com.normation.rudder.repository.xml.ItemArchiveManagerImpl - Create a new archive: GitPath(archives/full/2019-07-26_10-01-02.108)
[2019-07-26 10:01:02] INFO  bootchecks - First full archive of configuration-repository items done
...

Subtasks 3 (0 open3 closed)

Bug #15376: Forget to fork backend batchReleasedNicolas CHARLESActions
Bug #15381: Add more info on LDAP backend errorReleasedVincent MEMBRÉActions
Bug #15383: Unhandled error if the LDAP connection is unavailableReleasedVincent MEMBRÉActions
Actions

Also available in: Atom PDF