Bug #24713
closedDynamic groups are slow to compute in Rudder 8.1
Description
Since we change query processor in Rudder 8.1, dynamic groups are slow to compute, especially on instance with thousands of nodes.
Updated by François ARMAND 7 months ago
- Related to Bug #24652: Rudder 8.1 slows down over time added
Updated by François ARMAND 7 months ago
- Related to Bug #24712: ExpiredCompliance events are pilling up added
Updated by François ARMAND 7 months ago · Edited
So, the new query processor works on two steps:
- first, analysis. We have three kind of backends, one working on CoreNodeFact objects in cache, one working on subgroups, one for others using the old LDAP query.
- for CoreNodeFact matchers, it directly matches properties on scala objects
- for SubGroup matchers, the analysis query group's nodeId for each sub group and then use that set of nodeIds,
- for LDAP matchers, the analysis does the old LDAP query composition of all LDAP lines to get a set of nodeIds matching them
Analysis also does the correct and/or, inv, with root or not, composition of lines.
- second, we process the query, ie we run one time through all CoreNodeFact and matches each node against the (composed) matcher.
So, further analysis shows that:
- the size of node fact does not matter, espcially having a lot of software or a couple does not change timing (appart for queries on software)
- even if we have a simplistic query on just OS, LDAP analysis takes almost all of the analysis timing,
- analysis is more than linearly (quadratic? more?) correlated with the number of nodes
- even if we have a simplistic query, the process part is more than linearly correlated with nodes,
We also see that if we remove the LDAP matcher which should not be used on simplistic query on OS (because it's purely a CoreNodeFact property), then we get:
- the process time depends on number of nodes only logarithmically
- we have a constant factor between the process time and the equivalent scala collect
function on the node list - the factor is x100 for using ZIO, and x10 for having logs (not sure why, it could need a dedicated analysis, likely the strings are built even if the log is not used, ie we are missing some call-by-name somewhere). But even with that x1000 factor, we are still in the range of a few milliseconds (versus micro seconds for the chunk trasversal with collect). So we don't really care.
- more importantly, the analysis time drops to micro-second range, which is what is expected for a couple composition, even with 10 lines of criteria.
So, the root cayse is clearly linked to the LDAP matcher.
More analysis show that:
- 1/ we don't really split the matcher between the three kinds, and so for a lot of cases (hostname, ram, properties, etc), we do BOTH an LDAP matcher and a CoreNodeFact matcher,
- 2/ when there is no LDAP line's matcher, we are still doing an LDAP query (and not "no query, just skip that since we don't have any")
- 3/ the LDAP query with 0 criteria returns ALL nodes, which is long (why we had a long analysis part depending on number of nodes), and which means that then, we do a
set of all node IDs".contains(nodeId)
on each nodes (which explains why the process part was so long, too).
- really deduplicate on matcher type,
- if no LDAP matcher, just skip LDAP query
An other optim could be: if group/ldap returns a set of node IDs the size of all nodes, then just return an alway-true matcher. But I'm not totally sure LDAP doesn't contain crap that could lead to that being false.
Updated by François ARMAND 7 months ago
- Status changed from New to In progress
- Assignee set to François ARMAND
Updated by François ARMAND 7 months ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Vincent MEMBRÉ
- Pull Request set to https://github.com/Normation/rudder/pull/5601
Updated by François ARMAND 7 months ago
With the proposed correction in PR, I get:
2024-04-15 19:21:51+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 36028 ms 2024-04-15 19:24:37+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 49727 ms 2024-04-15 19:25:01+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 23767 ms 2024-04-15 19:34:52+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 23668 ms 2024-04-15 19:36:22+0000 DEBUG dynamic-group.timing - Computing dynamic groups without dependencies finished in 22414 ms
So a roughtly 200x speed-up compared to first logs on the load server data (which may not be representative).
Updated by Anonymous 7 months ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|5432d58d65297fbfb7eed2a4d1ebe39f5dfc983c.
Updated by Alexis Mousset 7 months ago
- Subject changed from Dynamic group are slow to compute in Rudder 8.1 to Dynamic groups are slow to compute in Rudder 8.1
Updated by Vincent MEMBRÉ 7 months ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 8.1.1 which was released today.