Bug #15062: Allow only catching up with recent runs in agent report processing batch - Rudder - Issue Tracker

Actions

Copy link

Bug #15062

closed

Allow only catching up with recent runs in agent report processing batch

Added by Nicolas CHARLES about 6 years ago. Updated about 6 years ago.

Status:

Released

Priority:

N/A

Assignee:

Nicolas CHARLES

Category:

Performance and scalability

Target version:

5.0.12

Pull Request:

https://github.com/Normation/rudder/p...

Severity:

UX impact:

User visibility:

Effort required:

Priority:

Name check:

Reviewed

Fix check:

Checked

Regression:

Description

at start of the web interface, we catch up with all reports sent when the web interface was stopped, with a granularity of one day
on very loaded system, this is simply infeasible: there aren't enough ram to fetch it all, and poor postgresql choke on query

2019-06-11 20:21:29 UTCLOG:  duration: 997284.994 ms  execute <unnamed>: select distinct
          T.nodeid, T.executiontimestamp, coalesce(C.keyvalue, '') as nodeconfigid, coalesce(C.iscomplete, false) as complete, T.insertionid
        from
          (select nodeid, executiontimestamp, min(id) as insertionid from ruddersysevents where id > $1 and id <= $2 group by nodeid, executiontimestamp) as T
        left join
          (select
            true as iscomplete, nodeid, executiontimestamp, keyvalue
          from
            ruddersysevents where id > $3 and id <= $4 and
            eventtype = 'control' and
            component = 'end'
          ) as C
        on T.nodeid = C.nodeid and T.executiontimestamp = C.executiontimestamp
2019-06-11 20:21:29 UTCDETAIL:  parameters: $1 = '1428065585', $2 = '1437233953', $3 = '1428065585', $4 = '1437233953'

We should:

be able to turn this feature off (or say: i catch up only xx minutes to avoid a gray compliance at start)
be able to catch all (when using advanced reporting plugin), but deal with it in batches of yy minutes
improve indexes

query

select
            true as iscomplete, nodeid, executiontimestamp, keyvalue
          from
            ruddersysevents where id >  1428065585 and id <= 1437233953 and
            eventtype = 'control' and
            component = 'end' ;

takes 36 seconds (see https://explain.depesz.com/s/T07l ) when there's no load on database

the index on component is used, but that's all.

Index on component is used only for that, we should use this index instead
CREATE INDEX endRun_control_idx ON RudderSysEvents (id) WHERE eventType = 'control' and component = 'end';

having this index results in 2,5s on a crazyly highly loaded system(load of 4 on a 2 CPU system)

Subtasks 4 (0 open — 4 closed)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Nicolas CHARLES about 6 years ago

with the new index, it did finish

[2019-06-11 21:05:12] DEBUG report - [Store Agent Run Times #1] checking agent runs from SQL ID 1428065585 [2019-06-11T18:31:06.000Z - 2019-06-11T21:05:12.467Z]

...

[2019-06-11 21:20:17] DEBUG report - [Store Agent Run Times #1] (905407 ms) Added or updated 32373 agent runs, up to SQL ID 1440557937 (last run time was 2019-06-11T21:00:48.000Z)

Actions

Copy link

Updated by Nicolas CHARLES about 6 years ago

method used to create index

SET maintenance_work_mem TO '2GB';

drop index component_idx;

CREATE INDEX endRun_control_idx ON RudderSysEvents (id) WHERE eventType = 'control' and component = 'end';

Actions

Copy link

Updated by Nicolas CHARLES about 6 years ago

Status changed from New to In progress
Assignee set to Nicolas CHARLES

Actions

Copy link

Updated by Nicolas CHARLES about 6 years ago

Status changed from In progress to Pending technical review
Assignee changed from Nicolas CHARLES to François ARMAND
Pull Request set to https://github.com/Normation/rudder/pull/2256

PR https://github.com/Normation/rudder/pull/2256

Actions

Copy link

Updated by Rudder Quality Assistant about 6 years ago

Assignee changed from François ARMAND to Nicolas CHARLES

Actions

Copy link

Updated by Nicolas CHARLES about 6 years ago

Status changed from Pending technical review to Pending release

Applied in changeset rudder|fba640a7b082f31fbb6689f321308c317fbbd474.

Actions

Copy link

Updated by Vincent MEMBRÉ about 6 years ago

Name check set to To do

Actions

Copy link

Updated by Vincent MEMBRÉ about 6 years ago

Fix check set to To do

Actions

Copy link

Updated by François ARMAND about 6 years ago

Fix check changed from To do to Checked

Actions

Copy link

#10

Updated by Alexis Mousset about 6 years ago

Subject changed from store run agent batch may never catch up on very loaded system because of inefficient index and the way it handles reports to Allow only catching up with recent runs in agent report processing batch

Actions

Copy link

#11