https://issues.rudder.io/https://issues.rudder.io/themes/rudder7/favicon/favicon.ico?17096450182019-04-30T12:02:35ZIssue TrackerRudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1075812019-04-30T12:02:35ZFrançois ARMANDfrancois.armand@rudder.io
<ul></ul><p>Hello,</p>
<p>Thanks for reporting that problem. I don't think the problem you are describing is a config one, more like a bug. Unfortunatly, our postgres expert is on hollidays this week. <br />I see here that the problem is with "select 1" which is the default request to test connection availability used by the pool manager. I will start to look for that.</p>
<p>Is there other request in errors? Or perhaps do you have more information on the table impacted? It would help a lot to understand why some transaction get stalled (and I'm sorry I can't help more for now).</p>
<p>Also: does the problem started concurrently with some special event? (I would understand that you don't have the least idea about that).</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1075822019-04-30T12:26:09ZFrançois ARMANDfrancois.armand@rudder.io
<ul></ul><p>Thinking a bit more about it, the select 1 may be normal. As it is the query used by the pool to test connection health, it's normal to have some. I'm not sure about the status.</p>
<p>So, do you have other request? Where are reported the "dead row versions cannot be removed yet" entries if you try a VACUUM FULL VERBOSE ANALYZE?</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1075832019-04-30T13:32:30ZTobias Ell
<ul><li><strong>File</strong> <a href="/attachments/1478">vacuum_rudder_full.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1478/vacuum_rudder_full.log">vacuum_rudder_full.log</a> added</li></ul><p>Hi Francois,</p>
<p>I see the dead rows on many tables - I attach the vaccum log file.<br />The problem with the "select 1" connection is the state "idle in transaction".<br />This seems to block vaccuming on all tables. If the query is to test connection<br />availability this should not start a transaction.<br />Like this vaccum only works while jetty is stopped.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1075922019-04-30T20:24:08ZFrançois ARMANDfrancois.armand@rudder.io
<ul><li><strong>Category</strong> set to <i>Web - Maintenance</i></li><li><strong>Target version</strong> set to <i>5.0.10</i></li></ul><p>OK, it looks like there is a bunch of dead row in a lot of tables.</p>
<p>You are right that there is no need for the transaction for the test, but I believe it's how how it's done by default (and we didn't changed it). We could change that, or we could even update to JDBC4 driver which has a built-in function to do that (). But it should not locks all these tables, the select isn't even done a on a specific table. I will need to find a way to reproduce the vaccuum problem to be sure it actually solves it.</p>
<p>What happen if you kill that transaction request which is in "idle transaction" at the OS level and then start back the VACUUM? IE, in postgres pg_stat_activity, get the "pid" column (8071 in your example), and in a console: <br />kill 8071 <br />(actually, you can find the "idle in transaction" process and its pid directly in console: ps aux | grep "idle in transaction").</p>
<p>If the VACUUM works, it would be a good evidence that the problem is with the SELECT 1.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076022019-05-02T09:52:19ZFrançois ARMANDfrancois.armand@rudder.io
<ul><li><strong>Assignee</strong> set to <i>François ARMAND</i></li></ul> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076082019-05-02T11:08:17ZTobias Ell
<ul></ul><p>Oh the problem is with the "SELECT 1" - I have tested by ending the query with pg_terminate_backend(8071) and<br />checked by ending jetty (which owns the connection from local port 48393).<br />As soon as the "SELECT 1" is gone - vaccum works.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076092019-05-02T11:48:20ZFrançois ARMANDfrancois.armand@rudder.io
<ul></ul><p>Just to be sure I understand: VACUUM works before you kill jetty? Because it could be an other request done by Rudder which is also cleared when you kill jetty that block things.</p>
<p>OK, so I'm going to try to see if we can use something else (or at least not in a transaction). Thanks for both reporting and helping with the diagnostic!</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076102019-05-02T11:55:30ZTobias Ell
<ul></ul><p>VACCUM works if "SELECT 1" with state "idle in transaction" is gone.<br />I can do this by<br />a) ending the postgres process<br />or<br />b) ending jetty</p>
<p>With a "idle in transaction" query vaccum reports dead row versions and does not free disk space.</p>
<p>Is the "Seclect 1" only done at jetty start?<br />I ended the query 1 hour ago and it hasn't come back, yet.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076122019-05-02T13:15:41ZFrançois ARMANDfrancois.armand@rudder.io
<ul></ul><p>OK, thank you very much for the clarification, there's no doubt it is the culprit.</p>
<p>"SELECT 1" is used in the connexion pool (hikaricp, for the records) to test if a connexion is alive (and spawn the correct number based on pool size). It should do it at regular interval.</p>
<p>So maybe there is a set of condition needed to have a select 1 stuck. It could be linked with something that happens only at startup (for ex something is not fully ready, and it breaks).</p>
<p>Now that the culprit is clearly identified, we know what to change. Thank you!</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076272019-05-02T16:19:11ZFrançois ARMANDfrancois.armand@rudder.io
<ul><li><strong>Tracker</strong> changed from <i>Question</i> to <i>Bug</i></li><li><strong>Severity</strong> set to <i>Major - prevents use of part of Rudder | no simple workaround</i></li><li><strong>User visibility</strong> set to <i>Operational - other Techniques | Rudder settings | Plugins</i></li><li><strong>Priority</strong> set to <i>52</i></li></ul> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076282019-05-02T16:19:20ZFrançois ARMANDfrancois.armand@rudder.io
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In progress</i></li></ul> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076332019-05-02T16:41:46ZFrançois ARMANDfrancois.armand@rudder.io
<ul><li><strong>Status</strong> changed from <i>In progress</i> to <i>Pending technical review</i></li><li><strong>Assignee</strong> changed from <i>François ARMAND</i> to <i>Vincent MEMBRÉ</i></li><li><strong>Pull Request</strong> set to <i>https://github.com/Normation/rudder/pull/2211</i></li></ul><p>PR <a class="external" href="https://github.com/Normation/rudder/pull/2211">https://github.com/Normation/rudder/pull/2211</a></p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076362019-05-02T16:50:51ZFrançois ARMANDfrancois.armand@rudder.io
<ul><li><strong>Status</strong> changed from <i>Pending technical review</i> to <i>Pending release</i></li></ul><p>Applied in changeset <a class="changeset" title="Fixes #14789: postgres fills up and can't be vaccumed" href="https://issues.rudder.io/projects/rudder/repository/rudder/revisions/3a075dbe6f77be149da04ffb8e778a9e934ff28a">rudder|3a075dbe6f77be149da04ffb8e778a9e934ff28a</a>.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1076372019-05-02T16:52:44ZFrançois ARMANDfrancois.armand@rudder.io
<ul></ul><p>We have commited a patch that avoid using SELECT 1 as documented here: <a class="external" href="https://github.com/brettwooldridge/HikariCP">https://github.com/brettwooldridge/HikariCP</a> (search for `connectionTestQuery`). You should be able to test with tomorrow nightly if it correct the problem for you.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1077672019-05-03T09:48:52ZTobias Ell
<ul></ul><p>Thank you - I will test the patch next week.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1077962019-05-06T08:25:08ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>Idle in transaction means that a transaction began, and never ended.<br />So here a query started, and connection pool checked the validity of connection with select 1 (hiding previous query)</p>
<p>Removing the SELECT 1 may or may not fix the issue, but at least it will allow to see the actual query being performed</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1078022019-05-06T08:59:21ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>Tobias, I have one last question for you<br />Do you happen to have messages like<br /><pre>
2019-04-19 00:01:31 CEST [11185]: [1-1] LOG: sending cancel to blocking autovacuum PID 14822
2019-04-19 00:01:31 CEST [11185]: [2-1] DETAIL: Process 11185 waits for AccessExclusiveLock on relation 16512 of database 16386.
2019-04-19 00:01:31 CEST [11185]: [3-1] STATEMENT: vacuum full archivedruddersysevents
2019-04-19 00:01:31 CEST [14822]: [1-1] ERROR: canceling autovacuum task
2019-04-19 00:01:31 CEST [14822]: [2-1] CONTEXT: automatic analyze of table "rudder.public.archivedruddersysevents"
</pre> in your postgresql log files ? (most probably, in /var/log/postgresql)</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1078712019-05-06T11:48:12ZTobias Ell
<ul></ul><p>I do have<br />LOG: checkpoints are occurring too frequently (18 seconds apart)<br />HINT: Consider increasing the configuration parameter "checkpoint_segments".<br />ERROR: canceling autovacuum task<br />CONTEXT: automatic vacuum of table "rudder.public.archivedruddersysevents"</p>
<p>although my housekeeping script does not have errors anymore (production).</p>
<p>I am setting up a test environment to check the nightly.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1078842019-05-06T14:08:44ZTobias Ell
<ul></ul><p>I installed latest nightly on the test server and since restart I haven't found<br />a "SELECT 1" or "idle in transaction". Looks good.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1080642019-05-09T07:52:02ZFrançois ARMANDfrancois.armand@rudder.io
<ul></ul><p>Hello, would you mind looking if you have a problem now? As we don't know exactly what triggered the problem, we need to check after some time. <br />So:</p>
<p>- do you have a query in "idle in transaction" (other than SELECT 1, since we replaced it) ?<br />- does the VACUUM say it can perform everything without dead row ?</p>
<p>Thanks !</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1080652019-05-09T08:25:34ZTobias Ell
<ul></ul><p>Hello,</p>
<p>there are no "idle in transaction" queries ( and "SELECT 1" hasn't come up again either).<br />All vacuum operations have finished without warnings about "dead rows".</p>
<p>Fix seems to work - thank you.</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1080672019-05-09T08:43:46ZFrançois ARMANDfrancois.armand@rudder.io
<ul></ul><p>That's great, thank you for letting us know. We are going to cut a release in the next couple of days with that fixe. Thank you again!</p> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1086112019-05-14T10:05:57ZVincent MEMBRÉvme@rudder.io
<ul><li><strong>Subject</strong> changed from <i>postgres fills up and can't be vaccumed</i> to <i>Postgres fills up and can't be vaccumed</i></li></ul> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1086442019-05-14T13:00:22ZVincent MEMBRÉvme@rudder.io
<ul><li><strong>Status</strong> changed from <i>Pending release</i> to <i>Released</i></li></ul><p>This bug has been fixed in Rudder 5.0.10 which was released today.</p>
<ul>
<li><a href="https://docs.rudder.io/changelogs/5.0/index.html#5.0.10" class="external">5.0.10 changelog</a></li>
<li><a href="https://docs.rudder.io/reference/5.0/installation/upgrade.html" class="external">Upgrade manual</a></li>
</ul> Rudder - Bug #14789: Postgres fills up and can't be vaccumedhttps://issues.rudder.io/issues/14789?journal_id=1130302019-08-06T09:15:30ZNicolas CHARLESnicolas.charles@rudder.io
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-16 priority-default closed parent" href="/issues/15413">Bug #15413</a>: Autovacuum may never finish, or take more than one day, on busy Rudder instance, because of resources attrition</i> added</li></ul>