Actions
User story #4115
closedpostgresql database corruption
Status:
Rejected
Priority:
4
Assignee:
-
Category:
Server components
Target version:
Pull Request:
UX impact:
Suggestion strength:
User visibility:
Effort required:
Name check:
Fix check:
Regression:
Description
Hello
We are facing an issue with our rudder database which seems to be corrupted . Each time we try to access "node configuration" tab we see this in the error log :
Oct 29 14:22:59 rudder postgres[16432]: [16-1] user=rudder,db=rudder ERROR: could not read block 109354 of relation base/288068/288178: read only 5440 of 8192 bytes Oct 29 14:22:59 rudder postgres[16432]: [16-2] user=rudder,db=rudder STATEMENT: select executiondate, nodeid, ruleId, directiveid, serial, component, keyValue, executionTimeStamp, eventtype, policy, msg from RudderSysEvents join (select nodeid as Node, max(executiontimestamp) as Time from ruddersysevents where ruleId = 'hasPolicyServer-root' and component = 'common' and keyValue = 'EndRun' and executionTimeStamp > (now() - interval '15 minutes') group by nodeid ) as Ordering on Ordering.Node = nodeid and executionTimeStamp = Ordering.Time where 1=1 and ruleId = $1 and serial = $2 and executionTimeStamp > (now() - interval '15 minutes')
or:
Oct 29 14:23:39 rudder postgres[27246]: [2-1] user=[unknown],db=[unknown] LOG: connection received: host=::1 port=51251 Oct 29 14:23:39 rudder postgres[27246]: [3-1] user=rudder,db=rudder LOG: connection authorized: user=rudder database=rudder Oct 29 14:23:39 rudder postgres[27245]: [5-1] user=rudder,db=rudder LOG: disconnection: session time: 0:00:00.006 user=rudder database=rudder host=::1 port=51250 Oct 29 14:23:39 rudder postgres[27246]: [4-1] user=rudder,db=rudder ERROR: could not read block 1 of relation base/288068/2605: Bad address Oct 29 14:23:39 rudder postgres[27246]: [4-2] user=rudder,db=rudder STATEMENT: insert into RudderSysEvents (executionDate, nodeId, ruleId, directiveId, serial, Component, KeyValue, executionTimeStamp, eventType, msg, Policy) values ('2013-10-29T14:23:29.033367+00:00','root', 'hasPolicyServer-root' , 'common-root', '82', 'common', 'StartRun', '2013-10-29 14:23:25+00:00', 'log_info', 'Start execution', 'common' )
I tried various ways to fix this:
- restore from backup (no use, probably the corruption occurred before our oldest backup)
- cleared data from RudderSysEvents . This fixes the issue for a while, but after more data is added to that table the problem reoccurs.
- recreated indexes.
- made a backup of the database with pg_dump and reimported it.
- made a dump, deleted the rudder database , created a new rudder database, reimported the dump. No use.
I searched the web for tips on how to fix the corruption but I can't find anything useful. I suspect that the corruption may be somewhere in postgres system tables or data files.
Do you have any ideas on how to fix this?
Do you think reinstalling postgres and reimporting the data from a pg_dump file may help?
Any help will be greatly appreciated.
Regards
Daniel Stan
Actions