Project

General

Profile

Actions

Bug #20685

closed

Excessive Agent restarts (Agent on Debian 11)

Added by Nigel Mundy about 2 years ago. Updated about 2 years ago.

Status:
Released
Priority:
N/A
Category:
Agent
Target version:
Severity:
Minor - inconvenience | misleading | easy workaround
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
Checked
Regression:

Description

Hi,
I believe this is a bug, however I cannot see anything amiss in the logs as to why, however willing to take any guidance to try and debug further.

We have our servers being monitored by a monitoring tool (zabbix), and since upgrading the Rudder agents (and Server) to v7.0.0 we have been having excessive volumes of agent restarts across many of our nodes. We have 88 nodes, and in just the last 10 hours we have had over 1000 alerts of rudder services restarting on 43 of the nodes. This wasnt the case with Rudder v6.1.12.

I have attached various details and info to try and assist:

restart counts.txt
- a list of the restart counts received in the last 10 hours, showing the volumes arent consistent, not that it affects every node.

restart events.txt
- detailled breakdown of the last 1000 process restart timings.

Then, taking the current worst offender (vps001.bhs), I have the following files for the period 12:00-15:00 UTC today:

syslog, daemon.log (standard log files)
rudder.log (we direct all rudder and cf events to this log)
promise and cfengine logs for the persod around a couple of the restarts.

Environment Details for the agent:

root@vps001:/var/log# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye

root@vps001:/var/log# cat /etc/apt/sources.list.d/rudder.list
deb https://repository.rudder.io/apt/7.0/ bullseye main

root@vps001:/var/log# dpkg -l | grep -i rudder
ii  rudder-agent                   7.0.0-debian11                 amd64        Configuration management and audit tool - agent

root@vps001:~# systemctl status rudder*
rudder-agent.service - Rudder agent umbrella service
     Loaded: loaded (/lib/systemd/system/rudder-agent.service; enabled; vendor preset: enabled)
     Active: active (exited) since Tue 2022-02-01 15:32:33 UTC; 6min ago
       Docs: man:rudder(8)
             https://docs.rudder.io
    Process: 1492011 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 1492011 (code=exited, status=0/SUCCESS)
        CPU: 3ms

Feb 01 15:32:33 vps001 systemd[1]: Starting Rudder agent umbrella service...
Feb 01 15:32:33 vps001 systemd[1]: Finished Rudder agent umbrella service.

rudder-cf-serverd.service - CFEngine file server
     Loaded: loaded (/lib/systemd/system/rudder-cf-serverd.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-02-01 15:32:33 UTC; 6min ago
   Main PID: 1492017 (cf-serverd)
      Tasks: 1 (limit: 1128)
     Memory: 6.6M
        CPU: 573ms
     CGroup: /system.slice/rudder-cf-serverd.service
             └─1492017 /opt/rudder/bin/cf-serverd --graceful-detach=600 --no-fork --inform

Feb 01 15:32:33 vps001 systemd[1]: Started CFEngine file server.
Feb 01 15:32:36 vps001 cf-serverd[1492017]:   notice: Server is starting...
Feb 01 15:32:36 vps001 cf-serverd[1492017]: CFEngine(server) rudder Server is starting...

rudder-cf-execd.service - CFEngine Execution Scheduler
     Loaded: loaded (/lib/systemd/system/rudder-cf-execd.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-02-01 15:32:33 UTC; 6min ago
   Main PID: 1492015 (cf-execd)
      Tasks: 1 (limit: 1128)
     Memory: 104.2M
        CPU: 22.877s
     CGroup: /system.slice/rudder-cf-execd.service
             └─1492015 /opt/rudder/bin/cf-execd --no-fork

Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sudoParameters@@result_success@@32377fd7-02fd-43d0-aab7-28460>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder Deleted file '/var/rudder/tmp/check_ssh_key_distribution//root.aut>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@Common@@result_na@@hasPolicyServer-root@@common-hasPolicyServ>
Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@Common@@control@@rudder@@run@@0@@end@@20220128-232623-d3d4d1b>

ls -l /lib/systemd/system/rudder*
-rw-r--r-- 1 root root 469 Nov 22  2017 /lib/systemd/system/rudder-agent.service
-rw-r--r-- 1 root root 512 Nov 22  2017 /lib/systemd/system/rudder-cf-execd.service
-rw-r--r-- 1 root root 522 Nov 22  2017 /lib/systemd/system/rudder-cf-serverd.service


Files

clipboard-202202021442-i6khe.png (43.7 KB) clipboard-202202021442-i6khe.png Nicolas CHARLES, 2022-02-02 14:42
Actions

Also available in: Atom PDF