Project

General

Profile

Actions

Bug #26429

open

Jetty suddenly fails to start -problem with ldap?

Added by Intero Admin 13 days ago. Updated 12 days ago.

Status:
New
Priority:
N/A
Assignee:
-
Category:
Web - Config management
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
To do
Regression:
No

Description

Hi all,

Jetty suddenly fails to start. It tries several times, but then fails. In /var/log/rudder/webapp/*.stderrout.log
is mentioned:

2025-02-25 08:42:26+0100 ERROR bootchecks - Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up.
[2025-02-25T08:42:26.583+01:00] ERROR FATAL An error happen during Rudder boot. Rudder will stop now. Error: SystemError: An error occurred; cause was: jakarta.servlet.UnavailableException: Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up.
 -> com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:445)
 -> com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:445)
 -> com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:428)
 -> com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:454)
jakarta.servlet.UnavailableException: Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up.

Surely, Rudder is initialized. I have searched the web several hours for a solution, but didn't find one.

This behaviour occured after Updating to 8.2.3-ubuntu22.04. Although I am pretty sure, that the problem was earlier there, and a restart of rudder after the update brought it to the foreground.
Why? Because I have resetted the server to snapshot that happened 7 days ago, and it still did not work (But 7 days ago, the web page of rudder could be loaded).

Rudder is running inside a nonprivileged, not security nested lxd container (and ran there until now fine for approx. 6 months)

Can you please assist?

Thank you in advance!

Bruno

Actions #1

Updated by Intero Admin 13 days ago

Sorry, Update of rudder-server was from 8.2.3 to 8.2.4, OS is Ubuntu 22.04

Actions #2

Updated by François ARMAND 13 days ago

Hello, thanks for reporting.

As a first step, we will check that the entry is here, and then if it's the case (which is likely, given what you tested with snapshot restauration), we will try to understand what is going on. It's a very basic and ancient integrity check, so understanding why it triggered will be intersting.

So, step 1: check that the LDAP content is correct.
Can you please exec on the root server:

ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1'

The password to use for the interactive prompt is located in /opt/rudder/etc/rudder-passwords.conf for the RUDDER_OPENLDAP_BIND_PASSWORD key

In the command output, you should see several entries, among them the one that is seen as missing (dn: nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration)

Actions #3

Updated by Intero Admin 13 days ago

Hi Francois,

thank you very much for you assisstance.

I'have checked it, unfortunately there is no entry:

root@system: ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1'
Enter LDAP Password:
# extended LDIF
#
# LDAPv3
# base <groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration> with scope oneLevel
# filter: (objectclass=*)
# requesting: 1.1
#

# search result
search: 2
result: 0 Success

# numResponses: 1

Do you know how this can happen?

To make things clear: After everything failed, I turned back to a 7 days old snapshot (the snapshot is of the whole VM, not only the LXD-Container), and there the problem is still there. After having fiddeled around a lot, I again returned to that snapshot, so the snapshot is the actual state.

There is also a somewhat old snapshot of only the container directly in LXD (from 08/01/2025). I just now returned to this snapshot and excuted your mentioned command:

ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1'
Enter LDAP Password:
# extended LDIF
#
# LDAPv3
# base <groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration> with scope oneLevel
# filter: (objectclass=*)
# requesting: 1.1
#

# special:all, SystemGroups, GroupRoot, Rudder, rudder-configuration
dn: ruleTarget=special:all,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration

# policyServer:root, SystemGroups, GroupRoot, Rudder, rudder-configuration
dn: ruleTarget=policyServer:root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration

# hasPolicyServer-root, SystemGroups, GroupRoot, Rudder, rudder-configuration
dn: nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration

# special:all_policyServers, SystemGroups, GroupRoot, Rudder, rudder-configuration
dn: ruleTarget=special:all_policyServers,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration

# all-nodes-with-cfengine-agent, SystemGroups, GroupRoot, Rudder, rudder-configuration
dn: nodeGroupId=all-nodes-with-cfengine-agent,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration

# special:all_exceptPolicyServers, SystemGroups, GroupRoot, Rudder, rudder-configuration
dn: ruleTarget=special:all_exceptPolicyServers,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration

There is even more output and Jetty is working fine.

I think, those entries got lost in LDAP. Do you know this could happen?

Thank you in advance!

Bruno

Actions #4

Updated by François ARMAND 12 days ago

OK, so nice that you find them back. The only reason I see that could lead to that is some error during the upgrade process that lead to note keeping the LDAP configuration and loosing the data archive. I'm not sure how it can happen, but perhaps if you answer "yes" to dpkg question regarding getting the package new conf, you can reach that state.
Still, you should at least have the backups done before upgrade in /var/rudder/ldap/backup/ - you could look for the most recent and see if it has the entry, and if not, iteratively look in older one to see if the date help to correlate with some other odities

Actions #5

Updated by Intero Admin 12 days ago

Thank you! No, I dont's think that the issue was caused by the dpkq-question. As mentioned, older snapshots had the same issues and apt dist-upgrade did not run there.

Ok, seems like this cannot be repaired ad hoc. Getting back to LXD snapshot works as mentioned. But: Since 01/08/2025 there weren't modified directives, techniques etc. pp. but many computers (that have been reinstalled mostly) were joined to Rudder and they don't come in to the pending nodes.

That's why, please allow me this one question: How to cope with this? Is it a way, to just delete all computers in Rudders and let them join automatically by themselves?
Or has a factory reset of the agent to be made on all nodes?

Thank you!

Actions #6

Updated by Vincent MEMBRÉ 12 days ago

  • Target version changed from 8.2.5 to 8.2.6
Actions

Also available in: Atom PDF