Bug #24713: Dynamic groups are slow to compute in Rudder 8.1 - Rudder - Issue Tracker

Actions

Copy link

Bug #24713

closed

Dynamic groups are slow to compute in Rudder 8.1

Added by François ARMAND over 1 year ago. Updated over 1 year ago.

Status:

Released

Priority:

N/A

Assignee:

Vincent MEMBRÉ

Category:

Performance and scalability

Target version:

8.1.1

Pull Request:

https://github.com/Normation/rudder/p...

Severity:

UX impact:

User visibility:

Effort required:

Priority:

Name check:

To do

Fix check:

Checked

Regression:

Description

Since we change query processor in Rudder 8.1, dynamic groups are slow to compute, especially on instance with thousands of nodes.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by François ARMAND over 1 year ago

Related to Bug #24652: Rudder 8.1 slows down over time added

Actions

Copy link

Updated by François ARMAND over 1 year ago

Related to Bug #24712: ExpiredCompliance events are pilling up added

Actions

Copy link

Updated by François ARMAND over 1 year ago · Edited

So, the new query processor works on two steps:

first, analysis. We have three kind of backends, one working on CoreNodeFact objects in cache, one working on subgroups, one for others using the old LDAP query.
- for CoreNodeFact matchers, it directly matches properties on scala objects
- for SubGroup matchers, the analysis query group's nodeId for each sub group and then use that set of nodeIds,
- for LDAP matchers, the analysis does the old LDAP query composition of all LDAP lines to get a set of nodeIds matching them
Analysis also does the correct and/or, inv, with root or not, composition of lines.

second, we process the query, ie we run one time through all CoreNodeFact and matches each node against the (composed) matcher.

So, further analysis shows that:

the size of node fact does not matter, espcially having a lot of software or a couple does not change timing (appart for queries on software)
even if we have a simplistic query on just OS, LDAP analysis takes almost all of the analysis timing,
analysis is more than linearly (quadratic? more?) correlated with the number of nodes
even if we have a simplistic query, the process part is more than linearly correlated with nodes,

We also see that if we remove the LDAP matcher which should not be used on simplistic query on OS (because it's purely a CoreNodeFact property), then we get:
- the process time depends on number of nodes only logarithmically
- we have a constant factor between the process time and the equivalent scala collect function on the node list - the factor is x100 for using ZIO, and x10 for having logs (not sure why, it could need a dedicated analysis, likely the strings are built even if the log is not used, ie we are missing some call-by-name somewhere). But even with that x1000 factor, we are still in the range of a few milliseconds (versus micro seconds for the chunk trasversal with collect). So we don't really care.
- more importantly, the analysis time drops to micro-second range, which is what is expected for a couple composition, even with 10 lines of criteria.

So, the root cayse is clearly linked to the LDAP matcher.

More analysis show that:

1/ we don't really split the matcher between the three kinds, and so for a lot of cases (hostname, ram, properties, etc), we do BOTH an LDAP matcher and a CoreNodeFact matcher,
2/ when there is no LDAP line's matcher, we are still doing an LDAP query (and not "no query, just skip that since we don't have any")
3/ the LDAP query with 0 criteria returns ALL nodes, which is long (why we had a long analysis part depending on number of nodes), and which means that then, we do a set of all node IDs".contains(nodeId) on each nodes (which explains why the process part was so long, too).

So, a first simple couple of changes whould be:

really deduplicate on matcher type,
if no LDAP matcher, just skip LDAP query

An other optim could be: if group/ldap returns a set of node IDs the size of all nodes, then just return an alway-true matcher. But I'm not totally sure LDAP doesn't contain crap that could lead to that being false.

Actions

Copy link

Updated by François ARMAND over 1 year ago

Status changed from New to In progress
Assignee set to François ARMAND

Actions

Copy link

Updated by François ARMAND over 1 year ago

Status changed from In progress to Pending technical review
Assignee changed from François ARMAND to Vincent MEMBRÉ
Pull Request set to https://github.com/Normation/rudder/pull/5601

PR https://github.com/Normation/rudder/pull/5601

Actions

Copy link

Updated by François ARMAND over 1 year ago

With the proposed correction in PR, I get:

2024-04-15 19:21:51+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 36028 ms
2024-04-15 19:24:37+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 49727 ms
2024-04-15 19:25:01+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 23767 ms
2024-04-15 19:34:52+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 23668 ms
2024-04-15 19:36:22+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 22414 ms

So a roughtly 200x speed-up compared to first logs on the load server data (which may not be representative).

Actions

Copy link

Updated by Anonymous over 1 year ago

Status changed from Pending technical review to Pending release

Applied in changeset rudder|5432d58d65297fbfb7eed2a4d1ebe39f5dfc983c.

Actions

Copy link

Updated by Alexis Mousset over 1 year ago

Subject changed from Dynamic group are slow to compute in Rudder 8.1 to Dynamic groups are slow to compute in Rudder 8.1

Actions

Copy link

Updated by François ARMAND over 1 year ago

Fix check changed from To do to Checked

Actions

Copy link

#10

Updated by Vincent MEMBRÉ over 1 year ago

Status changed from Pending release to Released

This bug has been fixed in Rudder 8.1.1 which was released today.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Rudder

Custom queries

Bug #24713

Dynamic groups are slow to compute in Rudder 8.1

Updated by François ARMAND over 1 year ago

Updated by François ARMAND over 1 year ago

Updated by François ARMAND over 1 year ago · Edited

Updated by François ARMAND over 1 year ago

Updated by François ARMAND over 1 year ago

Updated by François ARMAND over 1 year ago

Updated by Anonymous over 1 year ago

Updated by Alexis Mousset over 1 year ago

Updated by François ARMAND over 1 year ago

Updated by Vincent MEMBRÉ over 1 year ago

	Related to Rudder - Bug #24652: Rudder 8.1 slows down over time	Released	Nicolas CHARLES				Actions
	Related to Rudder - Bug #24712: ExpiredCompliance events are pilling up	Released	Nicolas CHARLES				Actions