Project

General

Profile

Actions

Bug #15062

closed

Allow only catching up with recent runs in agent report processing batch

Added by Nicolas CHARLES almost 5 years ago. Updated over 4 years ago.

Status:
Released
Priority:
N/A
Category:
Performance and scalability
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
Reviewed
Fix check:
Checked
Regression:

Description

at start of the web interface, we catch up with all reports sent when the web interface was stopped, with a granularity of one day
on very loaded system, this is simply infeasible: there aren't enough ram to fetch it all, and poor postgresql choke on query

2019-06-11 20:21:29 UTCLOG:  duration: 997284.994 ms  execute <unnamed>: select distinct
          T.nodeid, T.executiontimestamp, coalesce(C.keyvalue, '') as nodeconfigid, coalesce(C.iscomplete, false) as complete, T.insertionid
        from
          (select nodeid, executiontimestamp, min(id) as insertionid from ruddersysevents where id > $1 and id <= $2 group by nodeid, executiontimestamp) as T
        left join
          (select
            true as iscomplete, nodeid, executiontimestamp, keyvalue
          from
            ruddersysevents where id > $3 and id <= $4 and
            eventtype = 'control' and
            component = 'end'
          ) as C
        on T.nodeid = C.nodeid and T.executiontimestamp = C.executiontimestamp
2019-06-11 20:21:29 UTCDETAIL:  parameters: $1 = '1428065585', $2 = '1437233953', $3 = '1428065585', $4 = '1437233953'

We should:
  1. be able to turn this feature off (or say: i catch up only xx minutes to avoid a gray compliance at start)
  2. be able to catch all (when using advanced reporting plugin), but deal with it in batches of yy minutes
  3. improve indexes

query

select
            true as iscomplete, nodeid, executiontimestamp, keyvalue
          from
            ruddersysevents where id >  1428065585 and id <= 1437233953 and
            eventtype = 'control' and
            component = 'end' ;

takes 36 seconds (see https://explain.depesz.com/s/T07l ) when there's no load on database

the index on component is used, but that's all.

Index on component is used only for that, we should use this index instead
CREATE INDEX endRun_control_idx ON RudderSysEvents (id) WHERE eventType = 'control' and component = 'end';

having this index results in 2,5s on a crazyly highly loaded system(load of 4 on a 2 CPU system)


Subtasks 4 (0 open4 closed)

Bug #15063: Change index on ruddersysevents to remove inefficient component index and replace it by a composite index ReleasedFrançois ARMANDActions
Bug #15064: Add an entry in rudder-upgrade to run index migration script during upgrate ReleasedFrançois ARMANDActions
Bug #15142: Missing migration script at upgrade from 4.1 to 5.0 on sles12ReleasedVincent MEMBRÉActions
Bug #15076: typo in query from parent ticketReleasedNicolas CHARLESActions

Related issues 1 (0 open1 closed)

Related to Rudder - Bug #14959: Batch Store Run Agent can be limited only in days for catching up old report RejectedActions
Actions #1

Updated by Nicolas CHARLES almost 5 years ago

with the new index, it did finish

[2019-06-11 21:05:12] DEBUG report - [Store Agent Run Times #1] checking agent runs from SQL ID 1428065585 [2019-06-11T18:31:06.000Z - 2019-06-11T21:05:12.467Z]

...

[2019-06-11 21:20:17] DEBUG report - [Store Agent Run Times #1] (905407 ms) Added or updated 32373 agent runs, up to SQL ID 1440557937 (last run time was 2019-06-11T21:00:48.000Z)

Actions #2

Updated by Nicolas CHARLES almost 5 years ago

method used to create index

SET maintenance_work_mem TO '2GB';

drop index component_idx;

CREATE INDEX endRun_control_idx ON RudderSysEvents (id) WHERE eventType = 'control' and component = 'end';

Actions #3

Updated by Nicolas CHARLES almost 5 years ago

  • Status changed from New to In progress
  • Assignee set to Nicolas CHARLES
Actions #4

Updated by Nicolas CHARLES almost 5 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Nicolas CHARLES to François ARMAND
  • Pull Request set to https://github.com/Normation/rudder/pull/2256
Actions #5

Updated by Rudder Quality Assistant almost 5 years ago

  • Assignee changed from François ARMAND to Nicolas CHARLES
Actions #6

Updated by Nicolas CHARLES almost 5 years ago

  • Status changed from Pending technical review to Pending release
Actions #7

Updated by Vincent MEMBRÉ almost 5 years ago

  • Name check set to To do
Actions #8

Updated by Vincent MEMBRÉ almost 5 years ago

  • Fix check set to To do
Actions #9

Updated by François ARMAND almost 5 years ago

  • Fix check changed from To do to Checked
Actions #10

Updated by Alexis Mousset almost 5 years ago

  • Subject changed from store run agent batch may never catch up on very loaded system because of inefficient index and the way it handles reports to Allow only catching up with recent runs in agent report processing batch
Actions #11

Updated by Alexis Mousset almost 5 years ago

  • Name check changed from To do to Reviewed
Actions #12

Updated by Vincent MEMBRÉ over 4 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 5.0.12 which was released today.

Actions #13

Updated by Nicolas CHARLES over 4 years ago

  • Related to Bug #14959: Batch Store Run Agent can be limited only in days for catching up old report added
Actions

Also available in: Atom PDF