https://issues.rudder.io/https://issues.rudder.io/themes/rudder7/favicon/favicon.ico?17096450182014-04-23T13:18:12ZIssue TrackerRudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=265582014-04-23T13:18:12ZVincent MEMBRÉvme@rudder.io
<ul></ul><p>I have one example:</p>
<ol>
<li>My promises on a new agent (cf_lock ~ 1Mb), takes 10 seconds to run, no problems on reporting or whatever</li>
<li>With time, cf_lock grows big ! ( ~140 Mb) and the agent
<ul>
<li>when ran by cf-execd now takes <strong>4 minutes</strong>,</li>
<li>reporting was ok during the last minute of before the next agent run</li>
<li>manually it stays quite fast ( ~ 20 seconds).</li>
</ul>
</li>
<li>Waiting a little, more (in my case there it was at ~155 Mb),
<ul>
<li>when cf-agent is ran by cf-execd it now takes <strong>11 minutes</strong>,</li>
<li>Reporting was never Ok, always some missing reports</li>
<li>and manual run takes <strong>4 minutes</strong></li>
</ul></li>
</ol>
<p>In all cases, the last_successful_update file is updated correctly, even with the 11 minutes.</p>
<p>I don't have agent piling in the two first cases, only the last one is causing piling up (and there was only 4 at the same time, so quite far from 8 agents)</p>
<p>In all cases, cfagent uses 100% cpu during the whole duration of the execution, leading in the last case in the usage of four cpus, impacting my other applications (maybe it's why the agent manually run cannot work correctly...)</p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=265592014-04-23T13:50:53ZVincent MEMBRÉvme@rudder.io
<ul></ul><p>As said in <a class="issue tracker-1 status-5 priority-7 priority-lowest closed" title="Bug: cf_lock.tcdb is not cleaned by check-rudder-agent script when update file is older than 10 minutes (Released)" href="https://issues.rudder.io/issues/4752">#4752</a>, coredumb and dnns, are using a cron to check if the size of that file is over 10Mb, and if over, clean (rm) the whole state directory.</p>
<p>They don't have any problems since, and they haven't met any downside using it.</p>
<p>Maybe the size, and the method to clean are not the good one, but i clearly think this is the idea we should add.</p>
<p>About the size: According to coredumb, agent start to slow at 10Mb, and at 100Mb its always slow.</p>
<p>I don't know if there is a strict rule, of if it gets slower randomly ... What i have seen is: the more you have, the slower it can be</p>
<p>Some datas, maybe need to be confirmed</p>
<ul>
<li><1Mb => 10 seconds</li>
<li>35 Mb: ~ 1 minute </li>
<li>140 Mb: 4 minutes</li>
<li>150 Mb: 11 minutes (maybe some corruption occured here ?)</li>
<li>230Mb: 10 minutes</li>
</ul>
<p>About the clean methods:</p>
<ul>
<li>I don't think we should delete the whole state dir.</li>
<li>I tried using 'tchmgr optimize', but that changed nothing at all (kept the same size, same time), same with -df option</li>
<li>I would only delete cf_lock.tcdb files ( that file and tcdb.lock file)</li>
</ul> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=265602014-04-23T14:47:02ZVincent MEMBRÉvme@rudder.io
<ul></ul><p>Another lead would be to use let cfengine optimize itself tcdb using TCDB_OPTIMIZE_PERCENT varaible.</p>
<p>That variable will make cfengine check if it has to optimize tcdb, <a class="external" href="https://github.com/cfengine/core/blob/master/libpromises/dbm_tokyocab.c#L128">https://github.com/cfengine/core/blob/master/libpromises/dbm_tokyocab.c#L128</a></p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=265612014-04-23T15:25:15ZVincent MEMBRÉvme@rudder.io
<ul></ul><p>To use a cron to delete cf_lock when over 10Mb</p>
<p>put in file /etc/cron.d/clean-rudder-locks</p>
<pre>
*/5 * * * * root if [ `ls -l /var/rudder/cfengine-community/state/cf_lock.tcdb | cut -f 5 -d " "` -gt "10485760" ]; then rm -rf /var/rudder/cfengine-community/state/cf_lock.tcdb* && /opt/rudder/bin/cf-agent -KI > /dev/null 2>&1; fi
</pre> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=265622014-04-24T13:26:48ZVincent MEMBRÉvme@rudder.io
<ul></ul><p>The rudder-clean works correctly on servers.</p>
<p>I have some nodes with TCDB_OPTIMIZE_PERCENT set to 25</p>
<p>my tcdb were reduced from 40Mb to 1Mb, (execution time lowered from 1 minute to 10 seconds!) and now tcdb is growing but really slowly. (from 1Mb to 1.9 in few hours, it was 0.15Mb by run before)</p>
<p>So I still need to look for the behavior in the long run.</p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=267002014-05-06T08:40:43ZVincent MEMBRÉvme@rudder.io
<ul></ul><p>The variable does not prevent the error with cf_lock growing big... The growth rate is slower (1Mb in 45 min, instead of 30 minutes), when the file gets bug (over 10Mb we get some errors too)</p>
<p>The check on the size seems the only solution to me, maybe helped by the variable</p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=267492014-05-11T15:43:09ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>The impact of killing the cf_lock database is that all locks on promises are removed, meaning everything that uses ifelapsed may fail.<br />It does not have any impact on persistent classes, nor package list.</p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=267502014-05-11T17:52:08ZNicolas CHARLESnicolas.charles@rudder.io
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Pending technical review</i></li><li><strong>Assignee</strong> changed from <i>Vincent MEMBRÉ</i> to <i>Jonathan CLARKE</i></li><li><strong>Pull Request</strong> set to <i>https://github.com/Normation/rudder-packages/pull/317</i></li></ul><p>PR is there<br /><a class="external" href="https://github.com/Normation/rudder-packages/pull/317">https://github.com/Normation/rudder-packages/pull/317</a></p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=267542014-05-11T21:28:06ZJonathan CLARKEjonathan.clarke@normation.com
<ul><li><strong>Status</strong> changed from <i>Pending technical review</i> to <i>Discussion</i></li><li><strong>Assignee</strong> changed from <i>Jonathan CLARKE</i> to <i>Nicolas CHARLES</i></li></ul> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=267672014-05-13T07:57:18ZNicolas CHARLESnicolas.charles@rudder.io
<ul><li><strong>Status</strong> changed from <i>Discussion</i> to <i>Pending technical review</i></li><li><strong>Assignee</strong> changed from <i>Nicolas CHARLES</i> to <i>Jonathan CLARKE</i></li></ul><p>PR updated</p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=267782014-05-13T12:14:40ZNicolas CHARLESnicolas.charles@rudder.io
<ul><li><strong>Status</strong> changed from <i>Pending technical review</i> to <i>Pending release</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset packages:commit:838eace354d6e2be06c09536268fd596086fdb9d.</p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=267792014-05-13T12:14:41ZJonathan CLARKEjonathan.clarke@normation.com
<ul></ul><p>Applied in changeset packages:commit:0d10ba0561ec64aae16a359102b0c1d498926887.</p> Rudder - Bug #4769: rudder-agent may be stucked by tokyo cabinet database bloatinghttps://issues.rudder.io/issues/4769?journal_id=276802014-06-06T16:30:37ZVincent MEMBRÉvme@rudder.io
<ul><li><strong>Status</strong> changed from <i>Pending release</i> to <i>Released</i></li></ul><p>This bug has been fixed in Rudder 2.9.5 (<a href="http://www.rudder-project.org/pipermail/rudder-announce/2014-June/000088.html" class="external">announcement</a> , <a href="http://www.rudder-project.org/foswiki/bin/view/System/Documentation:ChangeLog29" class="external">changelog</a>) and 2.10.1 (<a href="http://www.rudder-project.org/pipermail/rudder-announce/2014-June/000087.html" class="external">announcement</a> , <a href="http://www.rudder-project.org/foswiki/bin/view/System/Documentation:ChangeLog210" class="external">changelog</a>), which were released today.</p>
<ul>
<li>Download information: <a class="external" href="https://www.rudder-project.org/site/get-rudder/downloads/">https://www.rudder-project.org/site/get-rudder/downloads/</a></li>
</ul>