Actions
Bug #15062
closedAllow only catching up with recent runs in agent report processing batch
Status:
Released
Priority:
N/A
Assignee:
Category:
Performance and scalability
Target version:
Pull Request:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
Reviewed
Fix check:
Checked
Regression:
Description
at start of the web interface, we catch up with all reports sent when the web interface was stopped, with a granularity of one day
on very loaded system, this is simply infeasible: there aren't enough ram to fetch it all, and poor postgresql choke on query
2019-06-11 20:21:29 UTCLOG: duration: 997284.994 ms execute <unnamed>: select distinct T.nodeid, T.executiontimestamp, coalesce(C.keyvalue, '') as nodeconfigid, coalesce(C.iscomplete, false) as complete, T.insertionid from (select nodeid, executiontimestamp, min(id) as insertionid from ruddersysevents where id > $1 and id <= $2 group by nodeid, executiontimestamp) as T left join (select true as iscomplete, nodeid, executiontimestamp, keyvalue from ruddersysevents where id > $3 and id <= $4 and eventtype = 'control' and component = 'end' ) as C on T.nodeid = C.nodeid and T.executiontimestamp = C.executiontimestamp 2019-06-11 20:21:29 UTCDETAIL: parameters: $1 = '1428065585', $2 = '1437233953', $3 = '1428065585', $4 = '1437233953'We should:
- be able to turn this feature off (or say: i catch up only xx minutes to avoid a gray compliance at start)
- be able to catch all (when using advanced reporting plugin), but deal with it in batches of yy minutes
- improve indexes
query
select true as iscomplete, nodeid, executiontimestamp, keyvalue from ruddersysevents where id > 1428065585 and id <= 1437233953 and eventtype = 'control' and component = 'end' ;
takes 36 seconds (see https://explain.depesz.com/s/T07l ) when there's no load on database
the index on component is used, but that's all.
Index on component is used only for that, we should use this index instead
CREATE INDEX endRun_control_idx ON RudderSysEvents (id) WHERE eventType = 'control' and component = 'end';
having this index results in 2,5s on a crazyly highly loaded system(load of 4 on a 2 CPU system)
Actions