Bug #4314
closedInventories containing very long (> 4096) process name cannot be send to rudder server via CFEngine
Description
This issue has been encountered while deploying v2.8.1, but evidently affects all the versions shipped with cfengine-community 3.5.2 at least.
The initial deployment of rudder-agent package depends on the initial policy included within the rudder-agent rpm. One of the first steps would be to run the inventory and submit the *.ocs file to the server. Until the file is accepted to the server, the client could not get any updates to the policy, so the issue could not be corrected by modifying the centralized policy.
In our environment there are quite a few hosts running java processes with very long command line args list. In fact, they are way longer than 4096 bytes, so the /proc/<pid>/cmdline actually truncates them at 4096. The FusionInventory process collects them as-is, and properly encloses the (truncated) command line in <CMD></CMD> tags and indents them, so the lines get even bigger (a bit over 4100 bytes). Unfortunately, subsequent actions in the initial policy try to edit this file in various ways; as the result of these editions, the lines get truncated at 4096, thus losing the trailing </CMD>. The file is then transmitted successfully to the rudder server, and the task is marked as successfully completed (my understanding is that it is now should be repeated not earlier than after 8hrs).
The transmitted to the server file is rejected because of XMS syntax error, so it is impossible to accept the host into Rudder.
This bug is apparently related to cfengine bug https://cfengine.com/dev/issues/3882: "Editing a file containing a line longer than 4096 chars will cause the text to be truncated". While it is not considered to be very serious in cfengine (i.e. one of the comments says: "Clarifying the title to be a bit less scary - even though this sucks, it doesn't break big files, just files with big lines..." in rudder it seems to be a blocker, as it breaks the deployment process.
It may also be related to https://cfengine.com/dev/issues/3852#change-15347, which implicitly puts the blame on CF_BUFSIZE, limited to 4096.
Also this issue may be related to http://www.rudder-project.org/redmine/issues/3838 - if those long processes are intermittent, one of the inventory collections may fly. Unfortunately in our case those processes run for months, and there is no chance to slip by them.
As a workaround, I have modified the perl file, /opt/rudder/share/fusioninventory/lib/FusionInventory/Agent/Tools/Unix.pm, to make the command line a bit shorter, so addition of xml tags and indentations would not trigger the cfengine bug:
--- /opt/rudder/share/fusioninventory/lib/FusionInventory/Agent/Tools/Unix.pm.orig 2014-01-02 18:46:15.252993839 0000
++ /opt/rudder/share/fusioninventory/lib/FusionInventory/Agent/Tools/Unix.pm 2014-01-02 18:50:23.954332253 +0000@ -288,6 +288,9
@
my $time = $10;
my $cmd = $11;
+ # Alex: a workaround for the cfengine bug truncating lines longer than 4096 in edit_line
+ $cmd = substr $cmd,0,4076 if length($cmd) > 4076;
+
# try to get a consistant time format
my $begin;
if ($started =~ /^(\d{1,2}):(\d{2})/) {
Whether this fix would lead to the information loss is a separate issue (as the command line is truncated anyway by the OS), but at least it allows to accept the host to the server.