Bug #7987
closedrudder-upgrade takes too long
Description
The pgsql query
select count(*) from (select xpath('/entry/*[@fileFormat=<?>]',data) AS x from eventlog) as Y where array_upper(x, 1) > 0takes too long if you have 300k+ entries.
Server: SLES11 SP3 // postgresql91-server-9.1.18-0.3.1
This is also heavily impacted by the fact, that the script /opt/rudder/bin/rudder-upgrade, that contains this statement, is run twice on a single-server installation:
A 3.0.11->3.0.13 update took:
rpm rudder-inventory-ldap: 61 minutes rpm rudder-webapp: 61 minutesThis could been solved if you would:
- Save each type of update that is not required, because it has been checked during any previous update.
- Execute this kind of maintenance by the jetty in an async job, not during the RPM installation.
Updated by Nicolas CHARLES almost 9 years ago
We could also create a index on the xpath query
See
http://www.postgresql.org/message-id/29360.1267795352@sss.pgh.pa.us
http://stackoverflow.com/questions/31589652/postgresql-xpath-integer-index
Updated by Janos Mattyasovszky almost 9 years ago
I did duplicate the data "eventlog" (27G Table when dumped, 1.7G compressed) and did some tests on it.
Query without the index:
# time psql -d rudder -c "select count(*) from (select xpath('/entry/*[@fileFormat=4]',data) AS x from eventlog) as Y where array_upper(x, 1) > 0;" count ------- 0 (1 row) real 5m1.342s user 0m0.000s sys 0m0.000s
Added an Index like this:
# time psql -d rudder -c "CREATE INDEX eventlog_fileFormat ON eventlog (((((xpath('/entry/*/@fileFormat',data))[1])::text)::integer))" CREATE INDEX real 5m1.029s user 0m0.000s sys 0m0.000s
Re-Run the same query:
# time psql -d rudder -c "select count(*) from (select xpath('/entry/*[@fileFormat=4]',data) AS x from eventlog) as Y where array_upper(x, 1) > 0;" count ------- 0 (1 row) real 5m1.093s user 0m0.000s sys 0m0.000s
Basically no change, you'd have to change the original query, since it queries the whole same dataset for each fileFormat version twice, it would be a lot better to query all distinct values found as fileFormats, and do decisions according to which versions have been found...
Updated by Nicolas CHARLES almost 9 years ago
Hum, if there's no change in the query time, then the index is not used.
Index looks correct, so maybe the query isn't
Updated by Janos Mattyasovszky almost 9 years ago
What actually would help (without actually improving the original query's speed) is to do something like this:
# time psql -d rudder -c "select distinct Format from (select ((xpath('/entry//@fileFormat',data))[1]::text) as Format from eventlog) AS foo;" format -------- 5 6 (3 rows) real 5m13.226s user 0m0.000s sys 0m0.000s
This would only parse all the table content once, then you could make decisions based on what results came back.
Updated by Janos Mattyasovszky almost 9 years ago
As discussed on IRC:
New query without an Index in place (still long):
# time psql -d rudder -c "select count(*) from (select ((xpath('/entry//@fileFormat',data))[1]::text) as id from eventlog) as T where id='5'" count -------- 138272 (1 row) real 5m7.942s user 0m0.000s sys 0m0.000s
Putting the index in place:
# time psql -d rudder -c "CREATE INDEX eventlog_fileFormat ON eventlog (((((xpath('/entry//@fileFormat',data))[1])::text)));" CREATE INDEX real 5m9.031s user 0m0.000s sys 0m0.000s
Now the query runs like a charm in basically not time:
# time psql -d rudder -c "select count(*) from (select ((xpath('/entry//@fileFormat',data))[1]::text) as id from eventlog) as T where id='5'" count -------- 138272 (1 row) real 0m0.093s user 0m0.000s sys 0m0.000s
Updated by Nicolas CHARLES almost 9 years ago
- Category changed from Server components to Packaging
- Assignee set to Nicolas CHARLES
- Target version set to 2.11.19
Updated by Nicolas CHARLES almost 9 years ago
- Status changed from New to In progress
Updated by Nicolas CHARLES almost 9 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Nicolas CHARLES to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-packages/pull/893
Updated by Nicolas CHARLES almost 9 years ago
- Status changed from Pending technical review to Pending release
- % Done changed from 50 to 100
Applied in changeset rudder-packages|295cb39e062e43ea831c39eeb481dc1c703e7120.
Updated by Vincent MEMBRÉ almost 9 years ago
- Status changed from Pending release to Released