Bug #26429
openJetty suddenly fails to start -problem with ldap?
Description
Hi all,
Jetty suddenly fails to start. It tries several times, but then fails. In /var/log/rudder/webapp/*.stderrout.log
is mentioned:
2025-02-25 08:42:26+0100 ERROR bootchecks - Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up. [2025-02-25T08:42:26.583+01:00] ERROR FATAL An error happen during Rudder boot. Rudder will stop now. Error: SystemError: An error occurred; cause was: jakarta.servlet.UnavailableException: Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up. -> com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:445) -> com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:445) -> com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:428) -> com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:454) jakarta.servlet.UnavailableException: Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up.
Surely, Rudder is initialized. I have searched the web several hours for a solution, but didn't find one.
This behaviour occured after Updating to 8.2.3-ubuntu22.04. Although I am pretty sure, that the problem was earlier there, and a restart of rudder after the update brought it to the foreground.
Why? Because I have resetted the server to snapshot that happened 7 days ago, and it still did not work (But 7 days ago, the web page of rudder could be loaded).
Rudder is running inside a nonprivileged, not security nested lxd container (and ran there until now fine for approx. 6 months)
Can you please assist?
Thank you in advance!
Bruno
Updated by Intero Admin 12 days ago
Sorry, Update of rudder-server was from 8.2.3 to 8.2.4, OS is Ubuntu 22.04
Updated by François ARMAND 12 days ago
Hello, thanks for reporting.
As a first step, we will check that the entry is here, and then if it's the case (which is likely, given what you tested with snapshot restauration), we will try to understand what is going on. It's a very basic and ancient integrity check, so understanding why it triggered will be intersting.
So, step 1: check that the LDAP content is correct.
Can you please exec on the root server:
ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1'
The password to use for the interactive prompt is located in
/opt/rudder/etc/rudder-passwords.conf
for the RUDDER_OPENLDAP_BIND_PASSWORD
key
In the command output, you should see several entries, among them the one that is seen as missing (dn: nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration
)
Updated by Intero Admin 12 days ago
Hi Francois,
thank you very much for you assisstance.
I'have checked it, unfortunately there is no entry:
root@system: ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1' Enter LDAP Password: # extended LDIF # # LDAPv3 # base <groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration> with scope oneLevel # filter: (objectclass=*) # requesting: 1.1 # # search result search: 2 result: 0 Success # numResponses: 1
Do you know how this can happen?
To make things clear: After everything failed, I turned back to a 7 days old snapshot (the snapshot is of the whole VM, not only the LXD-Container), and there the problem is still there. After having fiddeled around a lot, I again returned to that snapshot, so the snapshot is the actual state.
There is also a somewhat old snapshot of only the container directly in LXD (from 08/01/2025). I just now returned to this snapshot and excuted your mentioned command:
ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1' Enter LDAP Password: # extended LDIF # # LDAPv3 # base <groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration> with scope oneLevel # filter: (objectclass=*) # requesting: 1.1 # # special:all, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: ruleTarget=special:all,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # policyServer:root, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: ruleTarget=policyServer:root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # hasPolicyServer-root, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # special:all_policyServers, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: ruleTarget=special:all_policyServers,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # all-nodes-with-cfengine-agent, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: nodeGroupId=all-nodes-with-cfengine-agent,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # special:all_exceptPolicyServers, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: ruleTarget=special:all_exceptPolicyServers,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration
There is even more output and Jetty is working fine.
I think, those entries got lost in LDAP. Do you know this could happen?
Thank you in advance!
Bruno
Updated by François ARMAND 12 days ago
OK, so nice that you find them back. The only reason I see that could lead to that is some error during the upgrade process that lead to note keeping the LDAP configuration and loosing the data archive. I'm not sure how it can happen, but perhaps if you answer "yes" to dpkg question regarding getting the package new conf, you can reach that state.
Still, you should at least have the backups done before upgrade in /var/rudder/ldap/backup/
- you could look for the most recent and see if it has the entry, and if not, iteratively look in older one to see if the date help to correlate with some other odities
Updated by Intero Admin 12 days ago
Thank you! No, I dont's think that the issue was caused by the dpkq-question. As mentioned, older snapshots had the same issues and apt dist-upgrade did not run there.
Ok, seems like this cannot be repaired ad hoc. Getting back to LXD snapshot works as mentioned. But: Since 01/08/2025 there weren't modified directives, techniques etc. pp. but many computers (that have been reinstalled mostly) were joined to Rudder and they don't come in to the pending nodes.
That's why, please allow me this one question: How to cope with this? Is it a way, to just delete all computers in Rudders and let them join automatically by themselves?
Or has a factory reset of the agent to be made on all nodes?
Thank you!
Updated by Vincent MEMBRÉ 12 days ago
- Target version changed from 8.2.5 to 8.2.6