How to clean the tables 'nodes' and 'nodes_info' of seemingly out of date informations ?
After doing database maintenance operations (as per written in FAQ (1)), I explored a bit the database, to see which tables are taking the most space.
(For information, my RudderMachine has ~60GB of storage, and manages 113 nodes exactly.)
rudder=# SELECT pg_size_pretty( pg_database_size('rudder') ); pg_size_pretty ---------------- 40 GB
So I checked for the heavyweight tables, and the winners are :
rudder=# SELECT relname AS "Table", pg_size_pretty(pg_total_relation_size(relid)) AS "Size", pg_size_pretty(pg_relation_size(relid)) AS "Relation size", pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS "External Size" FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 5; Table | Size | Relation size | External Size -------------------------+---------+---------------+--------------- nodes_info | 16 GB | 4392 kB | 16 GB nodes | 14 GB | 3405 MB | 10 GB archivedruddersysevents | 3237 MB | 2706 MB | 530 MB ruddersysevents | 2994 MB | 1429 MB | 1565 MB archivednodecompliance | 842 MB | 699 MB | 143 MB
So, I do have 2 tables using half of the DB weight.
I did a bit of exploration (and thanks to ticket #9518), and found out that some nodes have a long history stored in db, even for deleted reports or old configurations checks.
For the table `nodes`, it seems I can remove safely historical entries from nodes with a request like
DELETE FROM nodes WHERE endtime < (now() - interval '7 days'), but I fear it might break something, so I won't do.
rudder=# select count(*) from nodes where endtime < (now() - interval '7 days'); count ---------- 29083989
So, my question is:
- is the request written above (or a cleaner one) viable ?
- is there a possibility to clean the `nodes_info` table of what seems to be out of date values ?
Updated by Nicolas CHARLES 4 months ago
Thank you for your ticket.
I've never seen such large disk usage for table nodes, so there is definitively something odd here
The goal of table "nodes" is to store historical data for nodes, like knowing the name of nodes at certain time, or being able to know that there was a node with this name in the past - it is used mainly for the eventlogs
So, you should be able to delete old entries, but to be safe I wouldn't delete any entries younger than 1 or 2 months.
As for nodes_infos, this stores the config_ids that are still relevant (so all thoses with a current value), and are purged when a new config id appears.
There's a catch here: it seems that nodes_infos for deleted nodes are never purged (at least on my test system) - could it be that you deleted a lot lot of nodes?
If you run
select count(*) from nodes_infos;
you should have 113 result (one per node) - how many do you have ?
Updated by Kévin Mèche 4 months ago
In fact, we added and deleted some nodes up to 10 in total (if my memory serves me right).
Does the fact that some node are switched on/off (for maintenance operations, like adding RAM/ROM to the VM) has some impact?
For your questions :
rudder=# select count(*) from nodes_info; count ------- 113 (1 ligne)
For the deletion ... since it's a production environment, I request an overview :
DELETE FROM nodes WHERE endtime < (now() - interval '2 months');
But for `nodes_info` .. how can I clean this table, without breaking the history ?
In the worst case, since I stopped the service, I thought of deleting all informations related to the nodes (and nothing else : I want to keep the directive/rules/... setup), then re-accept the nodes afterwards.
For this worst case scenario, is there a way to do this cleanly?
A bit a data-exploration in the `nodes_info` table gave me that my oldests nodes all have a tremendeous amount of infos in their config_ids field.
I thought of keeping only the first and last entry of the given array. Is that a good way ?
Updated by Nicolas CHARLES 4 months ago
For nodes_info, you can safely purge all old data (old data == older than one agent run, except the latest entry).
If you even agree to have compliance unknown for the lenght of an agent run, you can truncate table nodes_info, and trigger a full generation - that would be much easier
Deleting the nodes wouldn't help you here.
Once you'll update to 5.0.12 or later, you'll also have better cleaning of old data (especially in the ldap directory)