Error at the end of a policy generation with too many nodes
I got the following error after doing a clear cache with 2500 nodes
[2019-06-01 12:59:22] WARN explain_compliance.a04d8f89-27f4-4428-ad89-6ab4e45798a4a04d8f89-27f4-4428-ad89-6ab4e45798a4 - Received a run at 2019 -06-01T12:57:11.000Z for node 'a04d8f89-27f4-4428-ad89-6ab4e45798a4a04d8f89-27f4-4428-ad89-6ab4e45798a4' with configId '20190529-153902-d76426ef ' but that node should be sending reports for configId 20190531-224745-355b5fdd Jun 01, 2019 12:59:22 PM com.zaxxer.nuprocess.linux.LinuxProcess start WARNING: Failed to start process java.io.IOException: error=7, Argument list too long at com.zaxxer.nuprocess.internal.LibJava8.Java_java_lang_UNIXProcess_forkAndExec(Native Method) at com.zaxxer.nuprocess.linux.LinuxProcess.start(LinuxProcess.java:109) at com.zaxxer.nuprocess.linux.LinProcessFactory.createProcess(LinProcessFactory.java:40) at com.zaxxer.nuprocess.NuProcessBuilder.start(NuProcessBuilder.java:266) at com.normation.rudder.hooks.RunNuCommand$.run(RunNuCommand.scala:153) at com.normation.rudder.hooks.RunHooks$.$anonfun$asyncRun$3(RunHooks.scala:186) at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:303) at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [2019-06-01 12:59:22] INFO policy.generation - Policy generation completed in: 2921610 ms [2019-06-01 12:59:22] ERROR policy.generation - Error when updating policy, reason was: Exit code=-2147483648 for hook: '/opt/rudder/etc/hooks.d/policy-generation-finished/50-reload-policy-file-server'. stdout: stderr: '' [2019-06-01 12:59:22] INFO policy.generation - Flag file '/opt/rudder/etc/policy-update-running' successfully removed [2019-06-01 12:59:22] ERROR policy.generation - Policy update error for process '111' at 2019-06-01 12:59:22: Exit code=-2147483648 for hook: '/opt/rudder/etc/hooks.d/policy-generation-finished/50-reload-policy-file-server'.
As explained in comments, the problem is that we create a string of all updated node IDs and we give it to hook through environment variable
To do so, the JVM fork and pass the value in process parameter. When the number of nodes increase sufficiently, we hit the
ARG_MAX limit, which is plateform specific.
In Linux, it is defined by default to 1/4 of
ulimit -s, and as by default
ulimit -s is
8bB, we have
But it's a little bit less clear than that: if we increase
ulimit -s to for ex
16kB, the limit from the JVM is still
2097152b - at least for Open JDK 1.8.131.
So, perhaps it's an hardcoded limit, or perhaps the interpretation differs from JVM to JVM.
Given that increasing the
ulimit -s from Linux does not increasing the size of string we are able to pass to the child process, we need to change the way we pass parameter.
For Rudder 5.0.12 and up (and all more recent branch),
RUDDER_NODE_IDS parameter is deprecated and we don't document it anymore in hook template. It is replaced by a new documented parameter:
RUDDER_NODE_IDS_PATH. That parameter contains the path toward a file that can be sourced and contains the list of updated node for that generation. Sourcing the file will define variable
RUDDER_NODE_IDS if needed.
To avoid breaking possible user hook, we still define the undocumented
RUDDER_NODE_IDS parameter with the same format than in Rudder 5.0.11 or previously if:
- there is user hooks present and executable in
/opt/rudder/etc/hooks.d/policy-generation-finished/ AND there is less than 3000 updated nodes
- OR Rudder
/opt/rudder/etc/rudder-web.properties configuration file contains property
So in the general case, you don't have to do anything and everything will continue to work as before. You only have to do something when you have more than 3000 nodes and personal hooks in
In the latter case, you only need to source the file given in
RUDDER_NODE_IDS_PATH parameter (by default:
/var/rudder/policy-generation-info/updated-nodeids) and use any of the defined variable in that file:
RUDDER_UPDATED_POLICY_SERVER_IDS: the array of updated policy servers during the generation, sorted from root to immediate relays to farer relays
RUDDER_UPDATED_NODE_IDS: the array of updated nodes during the generation, sorted alpha-numerically
RUDDER_NODE_IDS: the arry of all updated elements, starting by policy server then simple nodes.