Project

General

Profile

Actions

Question #15436

closed

How to clean the tables 'nodes' and 'nodes_info' of seemingly out of date informations ?

Added by Kévin Mèche over 5 years ago. Updated 7 months ago.

Status:
Resolved
Priority:
N/A
Assignee:
-
Category:
Web - Maintenance
Target version:
-
Regression:
No

Description

Hello,

After doing database maintenance operations (as per written in FAQ (1)), I explored a bit the database, to see which tables are taking the most space.
(For information, my RudderMachine has ~60GB of storage, and manages 113 nodes exactly.)

rudder=# SELECT pg_size_pretty( pg_database_size('rudder') );
 pg_size_pretty
----------------
 40 GB

So I checked for the heavyweight tables, and the winners are :

rudder=#  SELECT relname AS "Table", pg_size_pretty(pg_total_relation_size(relid)) AS "Size", pg_size_pretty(pg_relation_size(relid)) AS "Relation size",  pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS "External Size" FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 5;
          Table          |  Size   | Relation size | External Size
-------------------------+---------+---------------+---------------
 nodes_info              | 16 GB   | 4392 kB       | 16 GB
 nodes                   | 14 GB   | 3405 MB       | 10 GB
 archivedruddersysevents | 3237 MB | 2706 MB       | 530 MB
 ruddersysevents         | 2994 MB | 1429 MB       | 1565 MB
 archivednodecompliance  | 842 MB  | 699 MB        | 143 MB

So, I do have 2 tables using half of the DB weight.
I did a bit of exploration (and thanks to ticket #9518), and found out that some nodes have a long history stored in db, even for deleted reports or old configurations checks.

For the table `nodes`, it seems I can remove safely historical entries from nodes with a request like DELETE FROM nodes WHERE endtime < (now() - interval '7 days'), but I fear it might break something, so I won't do.

rudder=# select count(*) from nodes where endtime < (now() - interval '7 days');
  count
----------
 29083989

So, my question is:
- is the request written above (or a cleaner one) viable ?
- is there a possibility to clean the `nodes_info` table of what seems to be out of date values ?

Warm regards,

---
1. https://faq.rudder.io/knowledge-bases/2/articles/14-database-is-using-too-much-space


Related issues 1 (0 open1 closed)

Related to Rudder - Bug #17778: table nodes contains on entry per node per generation, which is too muchReleasedFrançois ARMANDActions
Actions

Also available in: Atom PDF