Bug #15011
closedError at the end of a policy generation with too many nodes
Description
I got the following error after doing a clear cache with 2500 nodes
[2019-06-01 12:59:22] WARN explain_compliance.a04d8f89-27f4-4428-ad89-6ab4e45798a4a04d8f89-27f4-4428-ad89-6ab4e45798a4 - Received a run at 2019 -06-01T12:57:11.000Z for node 'a04d8f89-27f4-4428-ad89-6ab4e45798a4a04d8f89-27f4-4428-ad89-6ab4e45798a4' with configId '20190529-153902-d76426ef ' but that node should be sending reports for configId 20190531-224745-355b5fdd Jun 01, 2019 12:59:22 PM com.zaxxer.nuprocess.linux.LinuxProcess start WARNING: Failed to start process java.io.IOException: error=7, Argument list too long at com.zaxxer.nuprocess.internal.LibJava8.Java_java_lang_UNIXProcess_forkAndExec(Native Method) at com.zaxxer.nuprocess.linux.LinuxProcess.start(LinuxProcess.java:109) at com.zaxxer.nuprocess.linux.LinProcessFactory.createProcess(LinProcessFactory.java:40) at com.zaxxer.nuprocess.NuProcessBuilder.start(NuProcessBuilder.java:266) at com.normation.rudder.hooks.RunNuCommand$.run(RunNuCommand.scala:153) at com.normation.rudder.hooks.RunHooks$.$anonfun$asyncRun$3(RunHooks.scala:186) at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:303) at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [2019-06-01 12:59:22] INFO policy.generation - Policy generation completed in: 2921610 ms [2019-06-01 12:59:22] ERROR policy.generation - Error when updating policy, reason was: Exit code=-2147483648 for hook: '/opt/rudder/etc/hooks.d/policy-generation-finished/50-reload-policy-file-server'. stdout: stderr: '' [2019-06-01 12:59:22] INFO policy.generation - Flag file '/opt/rudder/etc/policy-update-running' successfully removed [2019-06-01 12:59:22] ERROR policy.generation - Policy update error for process '111' at 2019-06-01 12:59:22: Exit code=-2147483648 for hook: '/opt/rudder/etc/hooks.d/policy-generation-finished/50-reload-policy-file-server'.
Cause¶
As explained in comments, the problem is that we create a string of all updated node IDs and we give it to hook through environment variable RUDDER_NODE_IDS
.
To do so, the JVM fork and pass the value in process parameter. When the number of nodes increase sufficiently, we hit the ARG_MAX
limit, which is plateform specific.
In Linux, it is defined by default to 1/4 of ulimit -s
, and as by default ulimit -s
is 8bB
, we have ARG_MAX=2097152b
.
But it's a little bit less clear than that: if we increase ulimit -s
to for ex 16kB
, the limit from the JVM is still 2097152b
- at least for Open JDK 1.8.131.
So, perhaps it's an hardcoded limit, or perhaps the interpretation differs from JVM to JVM.
Solution¶
Given that increasing the ulimit -s
from Linux does not increasing the size of string we are able to pass to the child process, we need to change the way we pass parameter.
For Rudder 5.0.12 and up (and all more recent branch), RUDDER_NODE_IDS
parameter is deprecated and we don't document it anymore in hook template. It is replaced by a new documented parameter: RUDDER_NODE_IDS_PATH
. That parameter contains the path toward a file that can be sourced and contains the list of updated node for that generation. Sourcing the file will define variable RUDDER_NODE_IDS
if needed.
To avoid breaking possible user hook, we still define the undocumented RUDDER_NODE_IDS
parameter with the same format than in Rudder 5.0.11 or previously if:
- there is user hooks present and executable in /opt/rudder/etc/hooks.d/policy-generation-finished/
AND there is less than 3000 updated nodes
- OR Rudder /opt/rudder/etc/rudder-web.properties
configuration file contains property rudder.hooks.policy-generation-finished.nodeids.compability=true
So in the general case, you don't have to do anything and everything will continue to work as before. You only have to do something when you have more than 3000 nodes and personal hooks in policy-generation-finished
.
In the latter case, you only need to source the file given in RUDDER_NODE_IDS_PATH
parameter (by default: /var/rudder/policy-generation-info/updated-nodeids
) and use any of the defined variable in that file:
- RUDDER_UPDATED_POLICY_SERVER_IDS
: the array of updated policy servers during the generation, sorted from root to immediate relays to farer relays
- RUDDER_UPDATED_NODE_IDS
: the array of updated nodes during the generation, sorted alpha-numerically
- RUDDER_NODE_IDS
: the arry of all updated elements, starting by policy server then simple nodes.