Bug #26464
openStackoverflow in NodeStatusReports event computing
Description
On load, we get an XSS in ComputeNodeStatusReportServiceImpl:
2025-03-04 06:40:42+0100 INFO policy.generation.timing - Policy generation succeeded in: 1 min 49 s 2025-03-04 06:40:42+0100 INFO policy.generation.manager - Successful policy update '740168' [started 2025-03-04 06:38:53 - ended 2025-03-04 06:40:42] java.lang.StackOverflowError at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:520) at com.normation.rudder.services.reports.ComputeNodeStatusReportServiceImpl.$anonfun$groupQueueActionByType$1(ComputeNodeStatusReportService.scala:373) at scala.Option.map(Option.scala:242) at com.normation.rudder.services.reports.ComputeNodeStatusReportServiceImpl.groupQueueActionByType(ComputeNodeStatusReportService.scala:372) at com.normation.rudder.services.reports.ComputeNodeStatusReportServiceImpl.$anonfun$groupQueueActionByType$1(ComputeNodeStatusReportService.scala:373) at scala.Option.map(Option.scala:242) at com.normation.rudder.services.reports.ComputeNodeStatusReportServiceImpl.groupQueueActionByType(ComputeNodeStatusReportService.scala:372) at com.normation.rudder.services.reports.ComputeNodeStatusReportServiceImpl.$anonfun$groupQueueActionByType$1(ComputeNodeStatusReportService.scala:373) at scala.Option.map(Option.scala:242) at com.normation.rudder.services.reports.ComputeNodeStatusReportServiceImpl.groupQueueActionByType(ComputeNodeStatusReportService.scala:372) at com.normation.rudder.services.reports.ComputeNodeStatusReportServiceImpl.$anonfun$groupQueueActionByType$1(ComputeNodeStatusReportService.scala:373) at scala.Option.map(Option.scala:242) (and loop on the last 3 lines)
WORKAROUND
This can be workarounded by increasing the stack size - which also point to a real system contention, and not a logic bug:
=> add -Xss64m
to the GC parameters in @/etc/default/rudder-jetty@alexandre.brianceau
It then may happen that jetty refuse to start because it is killed by systemd before having fully processed the old things.
You may need to force stop jetty, and perhaps wait for the agent to repair things, and perhaps wait a couple of generation/report processing before compliance converge back to green.
It looks like a real XSS because the groupBy is not stack safe, but we need to investigate, understand to root cause, and correct it.
The observed instance was Rudder 8.2.4 but nothing changed in more recent version here.
Updated by Vincent MEMBRÉ 5 days ago
- Status changed from New to In progress
- Assignee set to Vincent MEMBRÉ
Updated by François ARMAND 3 days ago
- Status changed from In progress to Pending technical review
- Priority changed from To review to 1 (highest)
- Pull Request set to https://github.com/Normation/rudder/pull/6224
Updated by Anonymous 2 days ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|9d99c14282b51d43f21c5e9c7afd1c1849f2aee2.