Project

General

Profile

Actions

Architecture #6087

closed

Architecture to handle thousands of nodes

Added by François ARMAND over 9 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
N/A
Category:
Performance and scalability
Effort required:
Name check:
Fix check:
Regression:

Description

Since 3.0, Rudder manage correctly hundreds of nodes. When we reach the couple of thousand of nodes, Rudder starts to be not nice to use (at all).

The main pain points are:

- the UI is slow;
- generation of policies takes several minutes (with cf-promises check disabled)

The main reason for that are (and are linked):
- we have (almost) no cache
- we don't handle diff/gradual update

Tipically, we should be able to only write promise files that are actually modified, not all file for a given node - and ideally, we should only write parameters-like file, all other being shared in a common place (and ln -s to).
Example of cache that could be set: we preciselly know when the list of directive change.

And the major cache that we could set-up is on nodes (inventories).

Note the in #5452, we introduced a cache for nodeInfo, but the underlying architecture was not adapted. Ideally, we want to have some event generated when a node/inventory/config is modified, and use these events to invalidate caches.


Related issues 1 (0 open1 closed)

Related to Rudder - Bug #5452: Performance issue for node listReleasedVincent MEMBRÉ2014-09-01Actions
Actions #1

Updated by François ARMAND over 9 years ago

Here are some metrics to get the size of things:

For 2000 nodes, ~10 user directives (based on different techniques) by nodes, ~30 rules, on a machine with non-ssd, 2Go for inventory, 2Go for Rudder (it seems to be the lower bounds):

Promise generation is taking > 7 minutes, ~5 of which are spent on writing 150000 files;

Here are some memory size in bytes used by (naive) data structures used:

All rules: 56160
All node infos: 3406344
All inventories: 183238992 => ~100ko by inventory
All directives: 1570880
All groups: 488368
All parameters: 1472

So, even with naive datastructure, we could very reasonnably brute-cache everything and only have around 1Go of ram taken by these caches for 10 000 nodes.

In fact, LDAP backend is of no use at all in that context, and so we can get back ram from it.

Actions #2

Updated by Nicolas CHARLES over 9 years ago

François ARMAND wrote:

Here are some metrics to get the size of things:

For 2000 nodes, ~10 user directives (based on different techniques) by nodes, ~30 rules, on a machine with non-ssd, 2Go for inventory, 2Go for Rudder (it seems to be the lower bounds):

Promise generation is taking > 7 minutes, ~5 of which are spent on writing 150000 files;

Here are some memory size in bytes used by (naive) data structures used:

All rules: 56160
All node infos: 3406344
All inventories: 183238992 => ~100ko by inventory

Be careful, i think in your test you have the same software for all nodes, which is not true in a real size environment.

All directives: 1570880

I'm really surprised by this size: 10 directives use 1,5 Mo ? how come ?

All groups: 488368
All parameters: 1472

So, even with naive datastructure, we could very reasonnably brute-cache everything and only have around 1Go of ram taken by these caches for 10 000 nodes.

In fact, LDAP backend is of no use at all in that context, and so we can get back ram from it.

Actions #3

Updated by François ARMAND over 9 years ago

Nicolas CHARLES wrote:

François ARMAND wrote:

Here are some metrics to get the size of things:

For 2000 nodes, ~10 user directives (based on different techniques) by nodes, ~30 rules, on a machine with non-ssd, 2Go for inventory, 2Go for Rudder (it seems to be the lower bounds):

Promise generation is taking > 7 minutes, ~5 of which are spent on writing 150000 files;

Here are some memory size in bytes used by (naive) data structures used:

All rules: 56160
All node infos: 3406344
All inventories: 183238992 => ~100ko by inventory

Be careful, i think in your test you have the same software for all nodes, which is not true in a real size environment.

Yes, it's taken care of (see for ex: http://www.rudder-project.org/redmine/issues/5965#note-10)

All directives: 1570880

I'm really surprised by this size: 10 directives use 1,5 Mo ? how come ?

No, there 10 nodes by node, but node 10 directives in total. There is ~100 categories/techniques/directives
And here, it's the Scala datastructure (FullActiveTechniqueCategory), which is quiet heavy.

Actions #4

Updated by François ARMAND over 9 years ago

  • Description updated (diff)
Actions #5

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 3.1.0~beta1 to 3.1.0~rc1
Actions #6

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 3.1.0~rc1 to 3.1.0
Actions #7

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 3.1.0 to 3.1.1
Actions #8

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 3.1.1 to 3.1.2
Actions #9

Updated by Jonathan CLARKE over 8 years ago

  • Target version changed from 3.1.2 to Ideas (not version specific)
Actions #10

Updated by François ARMAND over 7 years ago

  • Status changed from New to Rejected

In 4.0:

- almost everything is cached,
- generation time is dominated by cf-promises checks, and without that by file writing (for what we can't do anything for now).

So I'm closing that ticket and we will open more specific one when needed.

Actions

Also available in: Atom PDF