Project

General

Profile

Question #15436

Updated by Kévin Mèche over 5 years ago

Hello, 

 After doing database maintenance operations (as per written in FAQ (1)), I explored a bit the database, to see which tables are taking the most space. 
 (For information, my RudderMachine has ~60GB of storage) 
 <pre><code class="shell"> 
 rudder=# SELECT pg_size_pretty( pg_database_size('rudder') ); 
  pg_size_pretty 
 ---------------- 
  40 GB 
 </code></pre> 

 So I checked for the heavyweight tables, and the winners are : 
 <pre><code class="shell"> 
 rudder=#    SELECT relname AS "Table", pg_size_pretty(pg_total_relation_size(relid)) AS "Size", pg_size_pretty(pg_relation_size(relid)) AS "Relation size",    pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS "External Size" FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 5; 
           Table            |    Size     | Relation size | External Size 
 -------------------------+---------+---------------+--------------- 
  nodes_info                | 16 GB     | 4392 kB         | 16 GB 
  nodes                     | 14 GB     | 3405 MB         | 10 GB 
  archivedruddersysevents | 3237 MB | 2706 MB         | 530 MB 
  ruddersysevents           | 2994 MB | 1429 MB         | 1565 MB 
  archivednodecompliance    | 842 MB    | 699 MB          | 143 MB 
 </code></pre> 
 So, I do have 2 tables using half of the DB weight. 
 I did a bit of exploration (and thanks to ticket #9518), and found out that some nodes have a long history stored in db, even for deleted reports or old configurations checks. reports. 

 For the table `nodes`, it seems I can remove safely historical entries from nodes with a request like <code class="sql">DELETE FROM nodes WHERE endtime < (now() - interval '7 days')</code>, but I fear it might break something, so I won't do. 
 <pre><code class="shell"> 
 rudder=# select count(*) from nodes where endtime < (now() - interval '7 days'); 
   count 
 ---------- 
  29083989 
 </code></pre> 

 So, my question is: 
 - is the request written above (or a cleaner one) viable ? 
 - is there a possibility to clean the `nodes_info` table of what seems to be out of date values ? 

 Warm regards, 

 --- 
 1. https://faq.rudder.io/knowledge-bases/2/articles/14-database-is-using-too-much-space

Back