Project

General

Profile

Actions

User story #4115

closed

postgresql database corruption

Added by Daniel Stan over 10 years ago. Updated about 7 years ago.

Status:
Rejected
Priority:
4
Assignee:
-
Category:
Server components
UX impact:
Suggestion strength:
User visibility:
Effort required:
Name check:
Fix check:
Regression:

Description

Hello

We are facing an issue with our rudder database which seems to be corrupted . Each time we try to access "node configuration" tab we see this in the error log :

Oct 29 14:22:59 rudder postgres[16432]: [16-1] user=rudder,db=rudder ERROR:  could not read block 109354 of relation base/288068/288178: read only 5440 of 8192 bytes
Oct 29 14:22:59 rudder postgres[16432]: [16-2] user=rudder,db=rudder STATEMENT:  select executiondate, nodeid, ruleId, directiveid, serial, component, keyValue, executionTimeStamp, eventtype, policy, msg from RudderSysEvents join (select nodeid as Node, max(executiontimestamp) as Time from ruddersysevents where ruleId = 'hasPolicyServer-root' and component = 'common' and keyValue = 'EndRun' and executionTimeStamp > (now() - interval '15 minutes') group by nodeid ) as Ordering on Ordering.Node = nodeid and executionTimeStamp = Ordering.Time where 1=1 and ruleId = $1 and serial = $2 and executionTimeStamp > (now() - interval '15 minutes')

or:

Oct 29 14:23:39 rudder postgres[27246]: [2-1] user=[unknown],db=[unknown] LOG:  connection received: host=::1 port=51251
Oct 29 14:23:39 rudder postgres[27246]: [3-1] user=rudder,db=rudder LOG:  connection authorized: user=rudder database=rudder
Oct 29 14:23:39 rudder postgres[27245]: [5-1] user=rudder,db=rudder LOG:  disconnection: session time: 0:00:00.006 user=rudder database=rudder host=::1 port=51250
Oct 29 14:23:39 rudder postgres[27246]: [4-1] user=rudder,db=rudder ERROR:  could not read block 1 of relation base/288068/2605: Bad address
Oct 29 14:23:39 rudder postgres[27246]: [4-2] user=rudder,db=rudder STATEMENT:  insert into RudderSysEvents (executionDate, nodeId, ruleId, directiveId, serial, Component, KeyValue, executionTimeStamp, eventType, msg, Policy) values ('2013-10-29T14:23:29.033367+00:00','root', 'hasPolicyServer-root' , 'common-root', '82', 'common', 'StartRun', '2013-10-29 14:23:25+00:00', 'log_info', 'Start execution', 'common' )

I tried various ways to fix this:

- restore from backup (no use, probably the corruption occurred before our oldest backup)
- cleared data from RudderSysEvents . This fixes the issue for a while, but after more data is added to that table the problem reoccurs.
- recreated indexes.
- made a backup of the database with pg_dump and reimported it.
- made a dump, deleted the rudder database , created a new rudder database, reimported the dump. No use.

I searched the web for tips on how to fix the corruption but I can't find anything useful. I suspect that the corruption may be somewhere in postgres system tables or data files.

Do you have any ideas on how to fix this?
Do you think reinstalling postgres and reimporting the data from a pg_dump file may help?

Any help will be greatly appreciated.

Regards

Daniel Stan

Actions

Also available in: Atom PDF