Project

General

Profile

Bug #15011

Updated by François ARMAND almost 5 years ago

I got the following error after doing a clear cache with 2500 nodes 
 <pre> 
 [2019-06-01 12:59:22] WARN    explain_compliance.a04d8f89-27f4-4428-ad89-6ab4e45798a4a04d8f89-27f4-4428-ad89-6ab4e45798a4 - Received a run at 2019 
 -06-01T12:57:11.000Z for node 'a04d8f89-27f4-4428-ad89-6ab4e45798a4a04d8f89-27f4-4428-ad89-6ab4e45798a4' with configId '20190529-153902-d76426ef 
 ' but that node should be sending reports for configId 20190531-224745-355b5fdd 
 Jun 01, 2019 12:59:22 PM com.zaxxer.nuprocess.linux.LinuxProcess start 
 WARNING: Failed to start process 
 java.io.IOException: error=7, Argument list too long 
         at com.zaxxer.nuprocess.internal.LibJava8.Java_java_lang_UNIXProcess_forkAndExec(Native Method) 
         at com.zaxxer.nuprocess.linux.LinuxProcess.start(LinuxProcess.java:109) 
         at com.zaxxer.nuprocess.linux.LinProcessFactory.createProcess(LinProcessFactory.java:40) 
         at com.zaxxer.nuprocess.NuProcessBuilder.start(NuProcessBuilder.java:266) 
         at com.normation.rudder.hooks.RunNuCommand$.run(RunNuCommand.scala:153) 
         at com.normation.rudder.hooks.RunHooks$.$anonfun$asyncRun$3(RunHooks.scala:186) 
         at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:303) 
         at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37) 
         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) 
         at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) 
         at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
         at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
         at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
         at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 

 [2019-06-01 12:59:22] INFO    policy.generation - Policy generation completed in:      2921610 ms 
 [2019-06-01 12:59:22] ERROR policy.generation - Error when updating policy, reason was: Exit code=-2147483648 for hook: '/opt/rudder/etc/hooks.d/policy-generation-finished/50-reload-policy-file-server'. 
  stdout:  
  stderr: '' 
 [2019-06-01 12:59:22] INFO    policy.generation - Flag file '/opt/rudder/etc/policy-update-running' successfully removed 
 [2019-06-01 12:59:22] ERROR policy.generation - Policy update error for process '111' at 2019-06-01 12:59:22: Exit code=-2147483648 for hook: '/opt/rudder/etc/hooks.d/policy-generation-finished/50-reload-policy-file-server'. 
 </pre> 


 h2. Cause 

 As explained in comments, the problem is that we create a string of all updated node IDs and we give it to hook through environment variable @RUDDER_NODE_IDS@. 
 To do so, the JVM fork and pass the value in process parameter. When the number of nodes increase sufficiently, we hit the @ARG_MAX@ limit, which is plateform specific.  
 In Linux, it is defined by default to 1/4 of @ulimit -s@, and as by default @ulimit -s@ is @8bB@, we have @ARG_MAX=2097152b@. 
 But it's a little bit less clear than that: if we increase @ulimit -s@ to for ex @16kB@, the limit from the JVM is *still* @2097152b@ - at least for Open JDK 1.8.131.  
 So, perhaps it's an hardcoded limit, or perhaps the interpretation differs from JVM to JVM.  

 h2. Workaround 

 Given that increasing the @ulimit -s@ from Linux does not increasing the size of string we are able to pass to the child process, we need to change the way we pass parameter.  
 For Rudder 5.0.12 and up (and all more recent branch), @RUDDER_NODE_IDS@ parameter is deprecated and we don't document it anymore in hook template. It is replaced by a new documented parameter: @RUDDER_NODE_IDS_PATH@. That parameter contains the path toward a file that can be sourced and contains the list of updated node for that generation. Sourcing the file will define variable @RUDDER_NODE_IDS@ if needed. 

 To avoid breaking possible user hook, we still define the undocumented @RUDDER_NODE_IDS@ parameter with the same format than in Rudder 5.0.11 or previously if:  

 - there is user hooks present and executable in @/opt/rudder/etc/hooks.d/policy-generation-finished/@ AND there is less than 3000 updated nodes 
 - *OR* Rudder @/opt/rudder/etc/rudder-web.properties@ configuration file contains property @rudder.hooks.policy-generation-finished.nodeids.compability=true@  

 So in the general case, you don't have to do anything and everything will continue to work as before. You only have to do something when you have more than 3000 nodes and personal hooks in @policy-generation-finished@.  

 In the latter case, you only need to source the file given in @RUDDER_NODE_IDS_PATH@ parameter (by default: @/var/rudder/policy-generation-info/updated-nodeids@) and use any of the defined variable in that file: 

 - @RUDDER_UPDATED_POLICY_SERVER_IDS@: the *array* of updated policy servers during the generation, sorted from root to immediate relays to farer relays 
 - @RUDDER_UPDATED_NODE_IDS@: the *array* of updated nodes during the generation, sorted alpha-numerically 
 - @RUDDER_NODE_IDS@: the *arry* of all updated elements, starting by policy server then simple nodes.

Back