https://issues.rudder.io/https://issues.rudder.io/themes/rudder7/favicon/favicon.ico?17096450182016-01-15T17:42:01ZIssue TrackerRudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508542016-01-15T17:42:01ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>logs show this message<br /> error: Empty first server packet when listing directory '/var/rudder/share/c45e0e9f-5d9b-4685-b26c-35a3dcf35954/rules/cfengine-community/clockConfiguration/3.0'!</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508552016-01-15T17:42:50ZNicolas CHARLESnicolas.charles@rudder.io
<ul><li><strong>Assignee</strong> set to <i>Alexis Mousset</i></li></ul> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508562016-01-15T17:47:59ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>server did not crash<br /><pre>
root 6639 0.2 0.4 458236 7320 ? Ss 15:28 0:17 /var/rudder/cfengine-community/bin/cf-serverd
</pre></p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508572016-01-15T17:49:40ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>promise generation was at <br /><pre>
2016-01-15 17:09:51.786+00
</pre><br />but the failing copy at<br /><pre>
2016-01-15 17:11:37+00:00
</pre></p>
<p>so this should not be related</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508582016-01-15T17:51:14ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>This seems to be quite common on my relay<br /><pre>
cf_relay_rudder_local__1452876910_Fri_Jan_15_16_55_10_2016_0x7fda26b9e700: error: Empty first server packet when listing directory '/var/rudder/configuration-repository/shared-files'!
cf_relay_rudder_local__1452877517_Fri_Jan_15_17_05_17_2016_0x7fda26b9e700: error: Empty first server packet when listing directory '/usr/share/ncf/tree/30_generic_methods'!
cf_relay_rudder_local__1452879326_Fri_Jan_15_17_35_26_2016_0x7fda26b9e700: error: Empty first server packet when listing directory '/usr/share/ncf/tree/60_services/baseline/stable'!
</pre></p>
<p>(note that the failing node was directly connected to the server)</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508592016-01-15T17:58:23ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>On the relay, it did detect that the copy was invalid, but not on the node, (or a wring condition is used on the node)<br />so the file rudder_promises_generated remains, and it is never solved (until next regeneration)</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508602016-01-17T19:54:44ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>This also happens on CFEngine Enterprise 3.7.2</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508612016-01-17T20:18:26ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>More details:<br />On se server side:<br /><pre>
rudder verbose: 192.168.41.3> Host relay.rudder.local granted access to /var/rudder/configuration-repository/ncf/ncf_hash_file
rudder verbose: 192.168.41.3> Filename /var/rudder/configuration-repository/ncf/ncf_hash_file is resolved to /var/rudder/configuration-repository/ncf/ncf_hash_file
rudder verbose: 192.168.41.3> Found a matching rule in access list (/var/rudder/configuration-repository/ncf/ncf_hash_file in /var/rudder/configuration-repository/ncf)
rudder verbose: 192.168.41.3> Mapping root privileges to access non-root files
rudder verbose: 192.168.41.3> Host relay.rudder.local granted access to /var/rudder/configuration-repository/ncf/ncf_hash_file
rudder verbose: Obtained IP address of '192.168.41.3' on socket 9 from accept
rudder verbose: New connection (from 192.168.41.3, sd 9), spawning new thread...
rudder info: 192.168.41.3> Accepting connection
rudder verbose: 192.168.41.3> Setting socket timeout to 600 seconds.
rudder verbose: 192.168.41.3> Peeked CAUTH in TCP stream, considering the protocol as Classic
rudder verbose: 192.168.41.3> Peer's identity is: MD5=00600d9ee1389841e95ad24fe88c1d68
rudder verbose: 192.168.41.3> A public key was already known from relay.rudder.local/192.168.41.3 - no trust required
rudder verbose: 192.168.41.3> The public key identity was confirmed as root@relay.rudder.local
rudder verbose: 192.168.41.3> Authentication of client relay.rudder.local/192.168.41.3 achieved
rudder verbose: 192.168.41.3> Filename /usr/share/ncf/tree is resolved to /usr/share/ncf/tree
rudder verbose: 192.168.41.3> Found a matching rule in access list (/usr/share/ncf/tree in /usr/share/ncf/tree)
rudder verbose: 192.168.41.3> Mapping root privileges to access non-root files
rudder verbose: 192.168.41.3> Host relay.rudder.local granted access to /usr/share/ncf/tree
rudder verbose: 192.168.41.3> Filename /usr/share/ncf/tree is resolved to /usr/share/ncf/tree
rudder verbose: 192.168.41.3> Found a matching rule in access list (/usr/share/ncf/tree in /usr/share/ncf/tree)
rudder verbose: 192.168.41.3> Mapping root privileges to access non-root files
rudder verbose: 192.168.41.3> Host relay.rudder.local granted access to /usr/share/ncf/tree
rudder verbose: Obtained IP address of '192.168.41.3' on socket 10 from accept
rudder verbose: New connection (from 192.168.41.3, sd 10), spawning new thread...
rudder info: 192.168.41.3> Accepting connection
rudder verbose: 192.168.41.3> Setting socket timeout to 600 seconds.
rudder verbose: 192.168.41.3> Peeked CAUTH in TCP stream, considering the protocol as Classic
rudder verbose: 192.168.41.3> Peer's identity is: MD5=00600d9ee1389841e95ad24fe88c1d68
rudder verbose: 192.168.41.3> A public key was already known from relay.rudder.local/192.168.41.3 - no trust required
rudder verbose: 192.168.41.3> The public key identity was confirmed as root@relay.rudder.local
rudder verbose: 192.168.41.3> Authentication of client relay.rudder.local/192.168.41.3 achieved
</pre></p>
<p>on the client side<br /><pre>
R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@16@@Binaries update@@None@@2016-01-17 20:14:04+00:00##5f852b2b-3120-4a69-8f44-f4e403b6c9ad@#The CFengine binaries in /var/rudder/cfengine-community/bin are up to date
error: Empty first server packet when listing directory '/usr/share/ncf/tree'!
R: @@DistributePolicy@@result_na@@5f852b2b-3120-4a69-8f44-f4e403b6c9ad-DP@@5f852b2b-3120-4a69-8f44-f4e403b6c9ad-distributePolicy@@13@@Configure ncf@@None@@2016-01-17 20:14:04+00:00##5f852b2b-3120-4a69-8f44-f4e403b6c9ad@#Configuration of ncf is not necessary on non root servers
R: @@DistributePolicy@@result_error@@5f852b2b-3120-4a69-8f44-f4e403b6c9ad-DP@@5f852b2b-3120-a69-8f44-f4e403b6c9ad-distributePolicy@@13@@Propagate promises@@None@@2016-01-17 20:14:04+00:00##5f852b2b-3120-4a69-8f44-f4e403b6c9ad@#Cannot propagate common ncf promises
</pre></p>
<p>It looks as if the connection is reinitialized, or agent tries to connect twice.</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508622016-01-17T20:43:59ZNicolas CHARLESnicolas.charles@rudder.io
<ul><li><strong>File</strong> <a href="/attachments/972">debug</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/972/debug">debug</a> added</li><li><strong>Priority</strong> changed from <i>N/A</i> to <i>1</i></li></ul><p>Attached, debug output from the server.<br />The client failed to fetch error: Empty first server packet when listing directory '/usr/share/ncf/tree/60_services/baseline/stable'!</p>
<p>I'm wondering if it is not the agent that fails ..</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508632016-01-17T20:46:13ZNicolas CHARLESnicolas.charles@rudder.io
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/50863/diff?detail_id=63218">diff</a>)</li></ul> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508642016-01-18T08:35:34ZJonathan CLARKEjonathan.clarke@normation.com
<ul></ul><p>This is indeed worrisome.</p>
<p>I found the origin of that message in cf-agent code:</p>
<pre>
libcfnet/client_code.c:354:
293 /* Returning NULL (an empty list) does not mean empty directory but ERROR,
294 * since every directory has to contain at least . and .. */
295 Item *RemoteDirList(const char *dirname, bool encrypt, AgentConnection *conn)
296 {
...
351 if (recvbuffer[0] == '\0')
352 {
353 Log(LOG_LEVEL_ERR,
354 "Empty%s server packet when listing directory '%s'!",
355 (start == NULL) ? " first" : "",
356 dirname);
357 goto err;
358 }
...
405 err: /* free list */
406 for (Item *ip = start; ip != NULL; ip = start)
407 {
408 start = ip->next;
409 free(ip->name);
410 free(ip);
411 }
</pre>
<p>So there is indeed an error printed out, but as far as I can tell the connection is not marked as bad or closed.</p>
<p>The cause for the error is a packet from the server containing exactly nothing (just the final '\0' byte). This basically means the error is on the server side, because it is either sending an empty packet when it shouldn't, or <strong>possibly</strong> sending the right packet that <strong>happens</strong> to have a "\0" as it's first byte.</p>
<p>Nico, what exact versions of the agent and the server are you using? Have you tried varying server versions or just agent versions in above tests?</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508652016-01-18T08:39:35ZJonathan CLARKEjonathan.clarke@normation.com
<ul></ul><p>If you can reproduce this pretty often, we can build a patched version of cf-serverd (and cf-agent, if needed) and print debug information to figure out what is really going on. Or just run cf-serverd under gdb, but that could be very tiresome if we don't know whether the bug is occuring or not in a given run, since don't yet have a way of detecting that on the server side.</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508662016-01-18T08:42:49ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>so far, i reproduiced it with any combinaision of 3.7.x <-> 3.7.x<br />I'm building a 3.6 to validate there</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508672016-01-18T08:43:20ZJonathan CLARKEjonathan.clarke@normation.com
<ul></ul><p>Could you run a find on the directories that cause this "Empty first server packet when listing directory" on the server please? That is, from your logs above:</p>
<ul>
<li>/usr/share/ncf/tree/60_services/baseline/stable</li>
<li>/usr/share/ncf/tree</li>
</ul>
<p>at least, and attach them to this bug?</p>
<p>One option I see from reading the source code is a filename longer than 256 chars:</p>
<pre>
int CfOpenDirectory(ServerConnectionState *conn, char *sendbuffer, char *oldDirname)
...
/* TODO fix copying names greater than 256. */
</pre> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508692016-01-18T12:09:14ZAlexis Moussetalexis.mousset@rudder.io
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In progress</i></li></ul><p>The problem is that the check for the NULL character is done <strong>before</strong> decrypting the server response (<a class="external" href="https://github.com/cfengine/core/blob/3.7.x/libcfnet/client_code.c#L351">https://github.com/cfengine/core/blob/3.7.x/libcfnet/client_code.c#L351</a>). Preparing a fix.</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508702016-01-18T12:27:00ZAlexis Moussetalexis.mousset@rudder.io
<ul></ul><p>Upstream PR: <a class="external" href="https://github.com/cfengine/core/pull/2475">https://github.com/cfengine/core/pull/2475</a>. Preparing a fix for Rudder 3.2.</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508722016-01-18T13:28:34ZAlexis Moussetalexis.mousset@rudder.io
<ul><li><strong>Target version</strong> set to <i>3.2.0</i></li></ul> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508742016-01-18T13:30:29ZAlexis Moussetalexis.mousset@rudder.io
<ul><li><strong>Status</strong> changed from <i>In progress</i> to <i>Pending technical review</i></li><li><strong>Assignee</strong> changed from <i>Alexis Mousset</i> to <i>Jonathan CLARKE</i></li><li><strong>Pull Request</strong> set to <i>https://github.com/Normation/rudder-packages/pull/862</i></li></ul><p>PR <a class="external" href="https://github.com/Normation/rudder-packages/pull/862">https://github.com/Normation/rudder-packages/pull/862</a></p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508782016-01-18T13:36:23ZAlexis Moussetalexis.mousset@rudder.io
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-16 priority-default closed" href="/issues/7796">Bug #7796</a>: Rudder agent does not detect when promise copy failed</i> added</li></ul> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508792016-01-18T13:41:09ZAlexis Moussetalexis.mousset@rudder.io
<ul><li><strong>Status</strong> changed from <i>Pending technical review</i> to <i>Pending release</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Applied in changeset <a class="changeset" title="Fixes #7793: Random wrong or partial copy of promises from node to server" href="https://issues.rudder.io/projects/rudder/repository/rudder-packages/revisions/c367f984ebb50f3252ec5d27bcb1cfdd6aef71bc">rudder-packages|c367f984ebb50f3252ec5d27bcb1cfdd6aef71bc</a>.</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508872016-01-18T15:23:59ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>\o/</p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=508952016-01-18T17:23:59ZNicolas CHARLESnicolas.charles@rudder.io
<ul></ul><p>related cfengine ticket<br /><a class="external" href="https://dev.cfengine.com/issues/7908">https://dev.cfengine.com/issues/7908</a></p> Rudder - Bug #7793: Random wrong or partial copy of promises from node to serverhttps://issues.rudder.io/issues/7793?journal_id=516302016-01-22T15:56:25ZVincent MEMBRÉvme@rudder.io
<ul><li><strong>Status</strong> changed from <i>Pending release</i> to <i>Released</i></li></ul><p>This bug has been fixed in Rudder 3.2.0 which were released today. </p>
<ul>
<li>3.2: <a href="http://www.rudder-project.org/pipermail/rudder-announce/2016-January/000176.html" class="external">Announce</a> <a href="http://www.rudder-project.org/foswiki/bin/view/System/Documentation:ChangeLog32" class="external">Changelog</a></li>
<li>Download: <a class="external" href="https://www.rudder-project.org/site/get-rudder/downloads/">https://www.rudder-project.org/site/get-rudder/downloads/</a></li>
</ul>