Question #15436
closedHow to clean the tables 'nodes' and 'nodes_info' of seemingly out of date informations ?
Description
Hello,
After doing database maintenance operations (as per written in FAQ (1)), I explored a bit the database, to see which tables are taking the most space.
(For information, my RudderMachine has ~60GB of storage, and manages 113 nodes exactly.)
rudder=# SELECT pg_size_pretty( pg_database_size('rudder') );
pg_size_pretty
----------------
40 GB
So I checked for the heavyweight tables, and the winners are :
rudder=# SELECT relname AS "Table", pg_size_pretty(pg_total_relation_size(relid)) AS "Size", pg_size_pretty(pg_relation_size(relid)) AS "Relation size", pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS "External Size" FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 5;
Table | Size | Relation size | External Size
-------------------------+---------+---------------+---------------
nodes_info | 16 GB | 4392 kB | 16 GB
nodes | 14 GB | 3405 MB | 10 GB
archivedruddersysevents | 3237 MB | 2706 MB | 530 MB
ruddersysevents | 2994 MB | 1429 MB | 1565 MB
archivednodecompliance | 842 MB | 699 MB | 143 MB
So, I do have 2 tables using half of the DB weight.
I did a bit of exploration (and thanks to ticket #9518), and found out that some nodes have a long history stored in db, even for deleted reports or old configurations checks.
For the table `nodes`, it seems I can remove safely historical entries from nodes with a request like DELETE FROM nodes WHERE endtime < (now() - interval '7 days')
, but I fear it might break something, so I won't do.
rudder=# select count(*) from nodes where endtime < (now() - interval '7 days');
count
----------
29083989
So, my question is:
- is the request written above (or a cleaner one) viable ?
- is there a possibility to clean the `nodes_info` table of what seems to be out of date values ?
Warm regards,
---
1. https://faq.rudder.io/knowledge-bases/2/articles/14-database-is-using-too-much-space