Bug #14190
closed
Inventory may never finish if there is a disk issue or invalid mountpoint
Added by Nicolas CHARLES almost 6 years ago.
Updated over 2 years ago.
Category:
Web - Nodes & inventories
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Getting started - demo | first install | Technique editor and level 1 Techniques
Description
It occurs that inventory fails when some mount point are NFS, and NFS is failing - in this case, inventories are piling up, and never ending (they are simply killed).
We should include a timeout within inventory for disk exploration.
- Target version changed from 4.3.9 to 4.3.10
- Target version changed from 4.3.10 to 4.3.11
- Target version changed from 4.3.11 to 4.3.12
- Related to Bug #14476: Inventory can not complete on an hypervisor if one of the guest machine is not accessible any more added
- Translation missing: en.field_tag_list set to Sponsored
- Subject changed from Inventory may not finish if there is a disk issue or invalid mountpoint to Inventory may never finish if there is a disk issue or invalid mountpoint
- Description updated (diff)
- Target version changed from 4.3.12 to 5.0.10
- Severity set to Major - prevents use of part of Rudder | no simple workaround
- User visibility set to Getting started - demo | first install | Technique editor and level 1 Techniques
- Priority changed from 0 to 98
- Category changed from Packaging to Web - Nodes & inventories
- Assignee set to Nicolas CHARLES
- Target version changed from 5.0.10 to 5.0.11
The issue is in
[debug] Running FusionInventory::Agent::Task::Inventory::Linux::Drives
[debug2] executing df -P -T -k
setting up a nfs share, and shutting down the nfs server exhibit it.
there's already a timeout, set by "alarm", using --backend-collect-timeout (defaut 180s) , but it fails in this case
alarm doesn't seem to work with filehandle correctly in non interactiv script (see https://perldoc.perl.org/functions/alarm.html and https://www.perlmonks.org/?node_id=999179 )
changing getFileHandle in Tools.pm, in case of command with the following doesn't solves the issue
eval {
local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
alarm 5;
if (!open $handle, '-|', $params{command} . " 2>$nowhere") {
$params{logger}->error(
"Can't run command $params{command}: $ERRNO"
) if $params{logger};
return;
}
};
actually, this returns, but it's after that it locks, during the my $line = <$handle>;
doing
my $line;
# get headers line first
eval {
local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
alarm 5;
$line = <$handle>;
};
allows to get to the next step, however it still locks, and command doesn't end - probably the handle is not released
an easier solution coulf be to check it timeout is present on the system, and if so, run "timeout 5 df" , and if not, don't use timeout
- Status changed from New to In progress
- Status changed from In progress to Pending technical review
- Assignee changed from Nicolas CHARLES to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-packages/pull/1909
- Status changed from Pending technical review to Pending release
- Status changed from Pending release to Released
- Priority changed from 98 to 97
This bug has been fixed in Rudder 5.0.11 which was released today.
- Related to Bug #18832: Rudder Agent consumes complete Memory because of fdisk added
- Priority changed from 97 to 89
Also available in: Atom
PDF