Project

General

Profile

Bug #14190

Inventory may never finish if there is a disk issue or invalid mountpoint

Added by Nicolas CHARLES 5 months ago. Updated 28 days ago.

Status:
Released
Priority:
N/A
Category:
Web - Nodes & inventories
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Getting started - demo | first install | Technique editor and level 1 Techniques
Effort required:
Priority:
97
Tags:

Description

It occurs that inventory fails when some mount point are NFS, and NFS is failing - in this case, inventories are piling up, and never ending (they are simply killed).

We should include a timeout within inventory for disk exploration.


Related issues

Related to Rudder - Bug #14476: Inventory can not complete on an hypervisor if one of the guest machine is not accessible any moreNewActions

Associated revisions

Revision 62e0b4d3 (diff)
Added by Nicolas CHARLES about 1 month ago

Fixes #14190: Inventory may never finish if there is a disk issue or invalid mountpoint

History

#1

Updated by Alexis MOUSSET 5 months ago

  • Target version changed from 4.3.9 to 4.3.10
#2

Updated by François ARMAND 4 months ago

  • Target version changed from 4.3.10 to 4.3.11
#3

Updated by Vincent MEMBRÉ 2 months ago

  • Target version changed from 4.3.11 to 4.3.12
#4

Updated by François ARMAND 2 months ago

  • Related to Bug #14476: Inventory can not complete on an hypervisor if one of the guest machine is not accessible any more added
#5

Updated by François ARMAND about 1 month ago

  • Tags set to Sponsored
  • Subject changed from Inventory may not finish if there is a disk issue or invalid mountpoint to Inventory may never finish if there is a disk issue or invalid mountpoint
  • Description updated (diff)
  • Target version changed from 4.3.12 to 5.0.10
  • Severity set to Major - prevents use of part of Rudder | no simple workaround
  • User visibility set to Getting started - demo | first install | Technique editor and level 1 Techniques
  • Priority changed from 0 to 98
#6

Updated by François ARMAND about 1 month ago

  • Category changed from Packaging to Web - Nodes & inventories
#7

Updated by François ARMAND about 1 month ago

  • Assignee set to Nicolas CHARLES
#8

Updated by Vincent MEMBRÉ about 1 month ago

  • Target version changed from 5.0.10 to 5.0.11
#9

Updated by Nicolas CHARLES about 1 month ago

The issue is in

[debug] Running FusionInventory::Agent::Task::Inventory::Linux::Drives
[debug2] executing df -P -T -k

setting up a nfs share, and shutting down the nfs server exhibit it.
there's already a timeout, set by "alarm", using --backend-collect-timeout (defaut 180s) , but it fails in this case
alarm doesn't seem to work with filehandle correctly in non interactiv script (see https://perldoc.perl.org/functions/alarm.html and https://www.perlmonks.org/?node_id=999179 )

changing getFileHandle in Tools.pm, in case of command with the following doesn't solves the issue

            eval {
              local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
              alarm 5;
              if (!open $handle, '-|', $params{command} . " 2>$nowhere") {
                  $params{logger}->error(
                         "Can't run command $params{command}: $ERRNO" 
                 ) if $params{logger};
                 return;
                                                                                    }

            };

#10

Updated by Nicolas CHARLES about 1 month ago

actually, this returns, but it's after that it locks, during the my $line = <$handle>;

#11

Updated by Nicolas CHARLES about 1 month ago

doing

my $line;
    # get headers line first
        eval {
           local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
           alarm 5;
           $line = <$handle>;
        };

allows to get to the next step, however it still locks, and command doesn't end - probably the handle is not released

#12

Updated by Nicolas CHARLES about 1 month ago

an easier solution coulf be to check it timeout is present on the system, and if so, run "timeout 5 df" , and if not, don't use timeout

#13

Updated by Nicolas CHARLES about 1 month ago

  • Status changed from New to In progress
#14

Updated by Nicolas CHARLES about 1 month ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Nicolas CHARLES to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/1909
#15

Updated by Nicolas CHARLES about 1 month ago

  • Status changed from Pending technical review to Pending release
#16

Updated by Vincent MEMBRÉ 28 days ago

  • Status changed from Pending release to Released
  • Priority changed from 98 to 97

This bug has been fixed in Rudder 5.0.11 which was released today.

Also available in: Atom PDF