Bug #10456
closedSeveral core features don't work anymore with more than 1000 nodes
Description
I somehow managed to produce this by adding 2287 nodes to a 4.1-rc1, and this happened after the jetty-restart:
[2017-03-17 15:09:26] INFO migration - Migration of inline variables in Directives to 'rudder' namespace already done, skipping [2017-03-17 15:09:26] INFO bootchecks - Check for force reload of Techniques library [2017-03-17 15:09:26] INFO bootchecks - Flag file '/opt/rudder/etc/force_technique_reload' does not exist, do not Technique library will not be reloaded [2017-03-17 15:09:26] INFO bootchecks - Check that system groups 'hasPolicyServer-*' are dynamics and correct them [2017-03-17 15:09:26] INFO bootchecks - Resume policy update if it was running before shutdown [2017-03-17 15:09:26] INFO bootchecks - Flag file '/opt/rudder/etc/policy-update-running' found, Start a new policy update now [2017-03-17 15:09:26] INFO bootstrap.liftweb.Boot - classpath [2017-03-17 15:09:26] INFO com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Start policy generation, checking updated rules [2017-03-17 15:09:26] INFO application - Application Rudder started [2017-03-17 15:09:26] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Pre-policy-generation scripts hooks ran in 20 ms [2017-03-17 15:09:26] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Pre-policy-generation modules hooks in 2 ms, start getting all generation related data. 2017-03-17 15:09:26.455:INFO:oejs.AbstractConnector:Started SelectChannelConnector@127.0.0.1:8080 [2017-03-17 15:09:26] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - All relevant information fetched in 313 ms, start names historization. [2017-03-17 15:09:30] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Node contexts built in 4187 ms, start to build new node configurations. [2017-03-17 15:09:31] ERROR historization - Error with update nodes historization process <- Could not update the nodes historization information in base. <- null [2017-03-17 15:09:31] ERROR historization - Root cause was: null [2017-03-17 15:09:31] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Policy generation completed in 4808 ms [2017-03-17 15:09:31] ERROR com.normation.rudder.batch.AsyncDeploymentAgent$DeployerAgent - Error when updating policy, reason Could not update the nodes historization information in base. <- null [2017-03-17 15:09:31] ERROR com.normation.rudder.batch.AsyncDeploymentAgent - Policy update error for process '10' at 2017-03-17 15:09:31: Could not update the nodes historization information in base. Exception in thread "pool-6-thread-15" java.lang.StackOverflowError at doobie.util.atom$Atom$$anon$2.$anonfun$set$2(atom.scala:47) at doobie.util.atom$Atom$$anon$2.$anonfun$set$2$adapted(atom.scala:47) at doobie.util.composite$LowerPriorityComposite$$anon$6.$anonfun$set$4(composite.scala:109) at doobie.util.composite$LowerPriorityComposite$$anon$6.$anonfun$set$4$adapted(composite.scala:109) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2(composite.scala:56) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2$adapted(composite.scala:56) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2(composite.scala:56) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2$adapted(composite.scala:56) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2(composite.scala:56) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2$adapted(composite.scala:56) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2(composite.scala:56) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2$adapted(composite.scala:56) at doobie.util.composite$Composite$$anon$2.$anonfun$set$2(composite.scala:56) [[[cut for sanity, but this continues down quite a lot]]]
Updated by Alexis Mousset almost 8 years ago
- Category set to Web - Compliance & node report
- Target version set to 4.1.0
- Severity set to Critical - prevents main use of Rudder | no workaround | data loss | security
Updated by Janos Mattyasovszky almost 8 years ago
This happend with currently 2287 in the system, I will try to go back to ~1000 nodes and test again.
Updated by François ARMAND almost 8 years ago
This is most likelly the same bug as #10380, but it is more impacting.
A first, simple workarounnd may be to just increase the available stack space. @Janos Matya, would you mind trying to add in `/etc/default/rudder-jetty` the following arg:
JAVA_OPTIONS="-Xss4M"
But the correct solution is to check the number of element in the "in" clause, and if more than, say 200, create a temporary table + join.
Updated by François ARMAND almost 8 years ago
- Related to Bug #10380: Dashboard do not load due to stackoverflow error in 4.1 added
Updated by Nicolas CHARLES almost 8 years ago
- Related to Bug #10453: Error during policy generation added
Updated by François ARMAND almost 8 years ago
- Subject changed from Cannot historize names to Stackoverflow during generation for 2200 nodes
And as Nicolas reported, the "historization: null" is most likelly something that is not correctly initialized, and we will be correcting it in #10453
Updated by Janos Mattyasovszky almost 8 years ago
that did work.
[2017-03-20 13:06:43] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Node configuration written on filesystem in 439848 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Node configuration cached in LDAP in 296 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Node configuration written in 440261 ms, start to update expected reports. [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Reports updated in 866 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Post-policy-generation hooks ran in 74 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Timing summary: [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Run pre-gen scripts hooks : 29 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Run pre-gen modules hooks : 2 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Fetch all information : 358 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Historize names : 901 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Build current rule values : 15 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Build target configuration: 766 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Update rule vals : 65 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Increment rule serials : 59 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Write node configurations : 440261 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Save expected reports : 866 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Run post generation hooks : 74 ms [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Number of nodes updated : 1087 [2017-03-20 13:06:44] DEBUG com.normation.rudder.services.policies.PromiseGenerationServiceImpl - Policy generation completed in 445502 ms [2017-03-20 13:06:44] INFO com.normation.rudder.batch.AsyncDeploymentAgent - Successful policy update '21' [started 2017-03-20 12:59:19 - ended 2017-03-20 13:06:44]
PS: These times were with writing to a ramdisk:
# df -hTP /var/rudder/share Filesystem Type Size Used Avail Use% Mounted on none tmpfs 2.0G 876M 1.2G 43% /var/rudder/share
Updated by François ARMAND almost 8 years ago
OK, nice, we have a workaround.
Numbers are interesting, to. We are spending just far too much time writting files :/
Updated by François ARMAND almost 8 years ago
Let merge all the stackoverflow problem under that one.
To sum-up, the problem is in our use of doobie Fragments.in as reported here: https://github.com/tpolecat/doobie/issues/426
Updated by François ARMAND almost 8 years ago
- Related to deleted (Bug #10453: Error during policy generation)
Updated by François ARMAND almost 8 years ago
- Has duplicate Bug #10453: Error during policy generation added
Updated by François ARMAND almost 8 years ago
- Related to deleted (Bug #10380: Dashboard do not load due to stackoverflow error in 4.1)
Updated by François ARMAND almost 8 years ago
- Has duplicate Bug #10380: Dashboard do not load due to stackoverflow error in 4.1 added
Updated by François ARMAND almost 8 years ago
- Subject changed from Stackoverflow during generation for 2200 nodes to Several core features don't work anymore with more than 1000 nodes
- User visibility set to Operational - other Techniques | Technique editor | Rudder settings
Updated by François ARMAND almost 8 years ago
- Status changed from New to In progress
- Assignee set to François ARMAND
Updated by François ARMAND almost 8 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder/pull/1605
Updated by Nicolas CHARLES almost 8 years ago
- Assignee changed from Nicolas CHARLES to Alexis Mousset
- Pull Request changed from https://github.com/Normation/rudder/pull/1605 to https://github.com/Normation/rudder/pull/1606
Updated by François ARMAND almost 8 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|916000fe6888fce9b1aae5a01d182dfc61190a1f.
Updated by Benoît PECCATTE over 7 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 4.1.0 which was released today.
- 4.1.0: Announce Changelog
- Download: https://www.rudder-project.org/site/get-rudder/downloads/
Updated by Nicolas CHARLES over 7 years ago
- Related to Bug #11037: Missing agent reports after Rudder server restart added