Project

General

Profile

Actions

Bug #14190

closed

Inventory may never finish if there is a disk issue or invalid mountpoint

Added by Nicolas CHARLES over 2 years ago. Updated almost 2 years ago.

Status:
Released
Priority:
N/A
Category:
Web - Nodes & inventories
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Getting started - demo | first install | Technique editor and level 1 Techniques
Effort required:
Priority:
97
Tags:

Description

It occurs that inventory fails when some mount point are NFS, and NFS is failing - in this case, inventories are piling up, and never ending (they are simply killed).

We should include a timeout within inventory for disk exploration.


Related issues

Related to Rudder - Bug #14476: Inventory can not complete on an hypervisor if one of the guest machine is not accessible any moreNewBenoît PECCATTEActions
Related to Rudder - Bug #18832: Rudder Agent consumes complete Memory because of fdiskReleasedBenoît PECCATTEActions
Actions #1

Updated by Alexis MOUSSET over 2 years ago

  • Target version changed from 4.3.9 to 4.3.10
Actions #2

Updated by François ARMAND about 2 years ago

  • Target version changed from 4.3.10 to 4.3.11
Actions #3

Updated by Vincent MEMBRÉ about 2 years ago

  • Target version changed from 4.3.11 to 4.3.12
Actions #4

Updated by François ARMAND about 2 years ago

  • Related to Bug #14476: Inventory can not complete on an hypervisor if one of the guest machine is not accessible any more added
Actions #5

Updated by François ARMAND about 2 years ago

  • Tags set to Sponsored
  • Subject changed from Inventory may not finish if there is a disk issue or invalid mountpoint to Inventory may never finish if there is a disk issue or invalid mountpoint
  • Description updated (diff)
  • Target version changed from 4.3.12 to 5.0.10
  • Severity set to Major - prevents use of part of Rudder | no simple workaround
  • User visibility set to Getting started - demo | first install | Technique editor and level 1 Techniques
  • Priority changed from 0 to 98
Actions #6

Updated by François ARMAND about 2 years ago

  • Category changed from Packaging to Web - Nodes & inventories
Actions #7

Updated by François ARMAND almost 2 years ago

  • Assignee set to Nicolas CHARLES
Actions #8

Updated by Vincent MEMBRÉ almost 2 years ago

  • Target version changed from 5.0.10 to 5.0.11
Actions #9

Updated by Nicolas CHARLES almost 2 years ago

The issue is in

[debug] Running FusionInventory::Agent::Task::Inventory::Linux::Drives
[debug2] executing df -P -T -k

setting up a nfs share, and shutting down the nfs server exhibit it.
there's already a timeout, set by "alarm", using --backend-collect-timeout (defaut 180s) , but it fails in this case
alarm doesn't seem to work with filehandle correctly in non interactiv script (see https://perldoc.perl.org/functions/alarm.html and https://www.perlmonks.org/?node_id=999179 )

changing getFileHandle in Tools.pm, in case of command with the following doesn't solves the issue

            eval {
              local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
              alarm 5;
              if (!open $handle, '-|', $params{command} . " 2>$nowhere") {
                  $params{logger}->error(
                         "Can't run command $params{command}: $ERRNO" 
                 ) if $params{logger};
                 return;
                                                                                    }

            };

Actions #10

Updated by Nicolas CHARLES almost 2 years ago

actually, this returns, but it's after that it locks, during the my $line = <$handle>;

Actions #11

Updated by Nicolas CHARLES almost 2 years ago

doing

my $line;
    # get headers line first
        eval {
           local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
           alarm 5;
           $line = <$handle>;
        };

allows to get to the next step, however it still locks, and command doesn't end - probably the handle is not released

Actions #12

Updated by Nicolas CHARLES almost 2 years ago

an easier solution coulf be to check it timeout is present on the system, and if so, run "timeout 5 df" , and if not, don't use timeout

Actions #13

Updated by Nicolas CHARLES almost 2 years ago

  • Status changed from New to In progress
Actions #14

Updated by Nicolas CHARLES almost 2 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Nicolas CHARLES to Benoît PECCATTE
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/1909
Actions #15

Updated by Nicolas CHARLES almost 2 years ago

  • Status changed from Pending technical review to Pending release
Actions #16

Updated by Vincent MEMBRÉ almost 2 years ago

  • Status changed from Pending release to Released
  • Priority changed from 98 to 97

This bug has been fixed in Rudder 5.0.11 which was released today.

Actions #17

Updated by Nicolas CHARLES 4 months ago

  • Related to Bug #18832: Rudder Agent consumes complete Memory because of fdisk added
Actions

Also available in: Atom PDF