Bug #26429
openChanging the system group category hierarchy breaks Rudder
Description
Hi all,
Jetty suddenly fails to start. It tries several times, but then fails. In /var/log/rudder/webapp/*.stderrout.log
is mentioned:
2025-02-25 08:42:26+0100 ERROR bootchecks - Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up. [2025-02-25T08:42:26.583+01:00] ERROR FATAL An error happen during Rudder boot. Rudder will stop now. Error: SystemError: An error occurred; cause was: jakarta.servlet.UnavailableException: Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up. -> com.normation.zio$ZioRuntime$.$anonfun$unsafeRun$1(ZioCommons.scala:445) -> com.normation.zio$ZioRuntime$.unsafeRun(ZioCommons.scala:445) -> com.normation.zio$ZioRuntime$.runNow(ZioCommons.scala:428) -> com.normation.zio$UnsafeRun.runNow(ZioCommons.scala:454) jakarta.servlet.UnavailableException: Error when checking for mandatory entries for 'root' server in the DIT. <- Inconsistency: Missing required entry 'nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration'. This is most likelly because Rudder was not initialized. Please run /opt/rudder/bin/rudder-init to set it up.
Surely, Rudder is initialized. I have searched the web several hours for a solution, but didn't find one.
This behaviour occured after Updating to 8.2.3-ubuntu22.04. Although I am pretty sure, that the problem was earlier there, and a restart of rudder after the update brought it to the foreground.
Why? Because I have resetted the server to snapshot that happened 7 days ago, and it still did not work (But 7 days ago, the web page of rudder could be loaded).
Rudder is running inside a nonprivileged, not security nested lxd container (and ran there until now fine for approx. 6 months)
Can you please assist?
Thank you in advance!
Bruno
Files
Updated by Intero Admin about 2 months ago
Sorry, Update of rudder-server was from 8.2.3 to 8.2.4, OS is Ubuntu 22.04
Updated by François ARMAND about 2 months ago
Hello, thanks for reporting.
As a first step, we will check that the entry is here, and then if it's the case (which is likely, given what you tested with snapshot restauration), we will try to understand what is going on. It's a very basic and ancient integrity check, so understanding why it triggered will be intersting.
So, step 1: check that the LDAP content is correct.
Can you please exec on the root server:
ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1'
The password to use for the interactive prompt is located in
/opt/rudder/etc/rudder-passwords.conf
for the RUDDER_OPENLDAP_BIND_PASSWORD
key
In the command output, you should see several entries, among them the one that is seen as missing (dn: nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration
)
Updated by Intero Admin about 2 months ago
Hi Francois,
thank you very much for you assisstance.
I'have checked it, unfortunately there is no entry:
root@system: ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1' Enter LDAP Password: # extended LDIF # # LDAPv3 # base <groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration> with scope oneLevel # filter: (objectclass=*) # requesting: 1.1 # # search result search: 2 result: 0 Success # numResponses: 1
Do you know how this can happen?
To make things clear: After everything failed, I turned back to a 7 days old snapshot (the snapshot is of the whole VM, not only the LXD-Container), and there the problem is still there. After having fiddeled around a lot, I again returned to that snapshot, so the snapshot is the actual state.
There is also a somewhat old snapshot of only the container directly in LXD (from 08/01/2025). I just now returned to this snapshot and excuted your mentioned command:
ldapsearch -o ldif-wrap=no -H "ldap://localhost:389" -x -D "cn=Manager,cn=rudder-configuration" -W -b "groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration" -s one '1.1' Enter LDAP Password: # extended LDIF # # LDAPv3 # base <groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration> with scope oneLevel # filter: (objectclass=*) # requesting: 1.1 # # special:all, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: ruleTarget=special:all,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # policyServer:root, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: ruleTarget=policyServer:root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # hasPolicyServer-root, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: nodeGroupId=hasPolicyServer-root,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # special:all_policyServers, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: ruleTarget=special:all_policyServers,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # all-nodes-with-cfengine-agent, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: nodeGroupId=all-nodes-with-cfengine-agent,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration # special:all_exceptPolicyServers, SystemGroups, GroupRoot, Rudder, rudder-configuration dn: ruleTarget=special:all_exceptPolicyServers,groupCategoryId=SystemGroups,groupCategoryId=GroupRoot,ou=Rudder,cn=rudder-configuration
There is even more output and Jetty is working fine.
I think, those entries got lost in LDAP. Do you know this could happen?
Thank you in advance!
Bruno
Updated by François ARMAND about 2 months ago
OK, so nice that you find them back. The only reason I see that could lead to that is some error during the upgrade process that lead to note keeping the LDAP configuration and loosing the data archive. I'm not sure how it can happen, but perhaps if you answer "yes" to dpkg question regarding getting the package new conf, you can reach that state.
Still, you should at least have the backups done before upgrade in /var/rudder/ldap/backup/
- you could look for the most recent and see if it has the entry, and if not, iteratively look in older one to see if the date help to correlate with some other odities
Updated by Intero Admin about 2 months ago
Thank you! No, I dont's think that the issue was caused by the dpkq-question. As mentioned, older snapshots had the same issues and apt dist-upgrade did not run there.
Ok, seems like this cannot be repaired ad hoc. Getting back to LXD snapshot works as mentioned. But: Since 01/08/2025 there weren't modified directives, techniques etc. pp. but many computers (that have been reinstalled mostly) were joined to Rudder and they don't come in to the pending nodes.
That's why, please allow me this one question: How to cope with this? Is it a way, to just delete all computers in Rudders and let them join automatically by themselves?
Or has a factory reset of the agent to be made on all nodes?
Thank you!
Updated by Vincent MEMBRÉ about 2 months ago
- Target version changed from 8.2.5 to 8.2.6
Updated by Intero Admin about 1 month ago
Hi everybody,
I think we achieved to reconstruct the problem:
When adding system groups into another category, the ldap db crashes and jetty doesn't start anymore. Maybe because the system group can't be found anymore. We think, there shouldn't be a possibility
to move system groups (as it seems that this isn't intended at all).
Maybe this is a bug. Can you please fix the issue, so that other people don't get caught into it?
Thank you very much!
Updated by Nicolas CHARLES 29 days ago
Hi !
I'm confused as to how you changed the group category, as the UI doesn't allow it for system groups
Did you change it using the API?
Updated by Intero Admin 29 days ago
· Edited
- File 2025-03-20 12_46_04-Rudder - Node Groups Management - Vivaldi.png added
Updated by Intero Admin 29 days ago
- File deleted (
2025-03-20 12_46_04-Rudder - Node Groups Management - Vivaldi.png)
Updated by Intero Admin 29 days ago
- File 2025-03-20 12_49_24-Rudder - Node Groups Management - Vivaldi.png 2025-03-20 12_49_24-Rudder - Node Groups Management - Vivaldi.png added
Hi Nicolas,
I did it in the web ui. Please have a look at the attached screenshot.
Updated by Nicolas CHARLES 25 days ago
- Subject changed from Jetty suddenly fails to start -problem with ldap? to Changing the system group category hierarchy breaks Rudder
Ok, thank you for the clarification
Indeed, the system group category can be moved, and it should not. I'm renaming this issue
Updated by Intero Admin 25 days ago
Thank you, Nicolas, good move for helping other users not to get in this issue :)
Updated by François ARMAND 22 days ago
- Assignee set to Clark ANDRIANASOLO
- Priority changed from To review to 1 (highest)
- Effort required set to Very Small
It's likely a check on "is system" missing for the category.
Updated by Clark ANDRIANASOLO 15 days ago
This has some link with #25348 : the system groups were able to be deleted, but they shouldn't, neither in the UI nor in the API.
Here there can be updated, in the UI and the API, but they shouldn't.
So the fix is to disallow it as in #25348, and for existing broken system groups, they should be moved to the "Root of the group and group categories" (parent ID : "GroupRoot").
In general system groups are managed by the system, they should not be modified by the user in any direct way.That means :
- disallowing update of system categories in the API for
POST /groups/categories/{groupCategoryId}
- removing the "Update" button and disable the selector for the group category
Updated by Clark ANDRIANASOLO 15 days ago
- Status changed from New to In progress
Updated by Clark ANDRIANASOLO 15 days ago
- Related to Bug #25348: Deleting CVE group is possible even if it is a system group added
Updated by Clark ANDRIANASOLO 14 days ago
- Status changed from In progress to Pending technical review
- Pull Request set to https://github.com/Normation/rudder/pull/6303
Updated by Clark ANDRIANASOLO 14 days ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|7d9cda9dc025414c710452ccb21eff34ae2233be.