Bug #10457
closedHook failed with fork: retry: No child processes
Description
I got an error after I found #10456:
[2017-03-17 15:31:18] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Policy generation completed in 93326 ms [2017-03-17 15:31:18] ERROR com.normation.rudder.batch.AsyncDeploymentAgent$DeployerAgent - Error when updating policy, reason Cannot write configuration node <- Exit code=1 for hook: '/opt/rudder/etc/hooks.d/policy-generation-node-ready/10-cf-promise-check' with environment variables: [PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] [NLSPATH:/usr/dt/lib/nls/msg/%L/%N.cat] [OLDPWD:/] [XFILESEARCHPATH:/usr/dt/app-defaults/%L/Dt] [PWD:/opt/rudder/jetty7] [SHLVL:1] [_:/usr/bin/java] [RUDDER_GENERATION_DATETIME:2017-03-17T15:29:45.296+01:00] [RUDDER_NODEID:61053f9f-b3de-4290-9eda-bc4fe1567233] [RUDDER_NEXT_POLICIES_DIRECTORY:/var/rudder/share/61053f9f-b3de-4290-9eda-bc4fe1567233/rules.new/cfengine-community] [RUDDER_AGENT_TYPE:cfengine-community]. Stdout: ' error: Can't stat file '/var/rudder/ncf//var/rudder/ncf/common/10_ncf_internals/list-compatible-inputs: fork: retry: No child processes' for parsing. (stat: No such file or directory) ' Stderr: '' [2017-03-17 15:31:18] ERROR com.normation.rudder.batch.AsyncDeploymentAgent - Policy update error for process '12' at 2017-03-17 15:31:18: Cannot write configuration node
Not sure if this isn't a limitation of nofiles, so it cannot fork?
Updated by Janos Mattyasovszky over 7 years ago
hah, found it:
[ 7219.731466] cgroup: fork rejected by pids controller in /system.slice/rudder-jetty.service [12893.159767] cgroup: fork rejected by pids controller in /system.slice/rudder-jetty.service
Updated by Janos Mattyasovszky over 7 years ago
According to https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#TasksMax=N, the fix would be to include this line in the unit file (which currently is auto-generated):
TasksMax=infinity
will test this.
Updated by Janos Mattyasovszky over 7 years ago
copied the auto-generated unit file to /etc
, and added the missing line:
sles12# systemctl cat rudder-jetty # /etc/systemd/system/rudder-jetty.service [Unit] SourcePath=/etc/init.d/rudder-jetty After=remote-fs.target network-online.target Wants=remote-fs.target network-online.target [Service] Type=forking TasksMax=infinity <== Added this Restart=no TimeoutSec=5min IgnoreSIGPIPE=no KillMode=process GuessMainPID=no RemainAfterExit=yes ExecStart=/etc/init.d/rudder-jetty start ExecStop=/etc/init.d/rudder-jetty stop
Updated by Alexis Mousset over 7 years ago
- Category set to System integration
- Target version set to 4.1.0
- Severity set to Major - prevents use of part of Rudder | no simple workaround
Updated by François ARMAND over 7 years ago
Well, perhaps it's better if we cap the number of parallel hook to say, 50? (or "number cpu + 1" or a configurable parameter). That won't change the throughout but certainly stress less the system and avoid these limit.
Updated by Janos Mattyasovszky over 7 years ago
I'd be happy with nproc --all
, the only problem is, what if I scale my VM during operations up, and give it more cores? Would I have to restart jetty then? Could this maybe be checked at each time a policy generation is started?
Updated by François ARMAND over 7 years ago
Oh yes, the thread pool and manager logic is created each time. But I will make sure of that, thanks for pointing that use case.
Updated by François ARMAND over 7 years ago
- User visibility set to Infrequent - complex configurations | third party integrations
Updated by François ARMAND over 7 years ago
- Status changed from New to In progress
- Assignee set to François ARMAND
Updated by François ARMAND over 7 years ago
OK, so when using a real task manager, I get more consistant results, around 10% better. But performance are hard etc.
Before:
Write node configurations : 91750 ms ... Write node configurations : 85166 ms ... Write node configurations : 95879 ms
After:
Write node configurations : 79947 ms ... Write node configurations : 79191 ms ... Write node configurations : 75608 ms
See pull requests for details.
Updated by François ARMAND over 7 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/1608
Updated by Nicolas CHARLES over 7 years ago
without the PR , for 1602 nodes
[2017-03-23 14:37:58] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 55878 ms
[2017-03-23 14:46:10] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 57303 ms
[2017-03-23 14:47:35] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 39640 ms
[2017-03-23 14:48:42] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 33611 ms
[2017-03-23 14:50:04] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 35395 ms
with this PR
[2017-03-23 15:09:16] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 34423 ms
[2017-03-23 15:10:31] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 40921 ms
[2017-03-23 15:12:14] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 48889 ms
[2017-03-23 15:14:09] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 32235 ms
Note that it is on a laptop, so not really reliable
Updated by François ARMAND over 7 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|bef6684f6635b052d8fab3bccc91da0c5e0216cb.
Updated by Janos Mattyasovszky over 7 years ago
Without this PR on 32cpus and 7000+ nodes:
Dunno, it never finished, and I stopped it after 9+ hours
With this PR (same system):
Sum ~28 minutes (just base policy, no rules/directives).
Updated by Benoît PECCATTE over 7 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 4.1.0 which was released today.
- 4.1.0: Announce Changelog
- Download: https://www.rudder-project.org/site/get-rudder/downloads/