Project

General

Profile

Bug #20685

Updated by Alexis Mousset almost 3 years ago

Hi,  
 I believe this is a bug, however I cannot see anything amiss in the logs as to why, however willing to take any guidance to try and debug further. 


 We have our servers being monitored by a monitoring tool (zabbix), and since upgrading the Rudder agents (and Server) to v7.0.0 we have been having excessive volumes of agent restarts across many of our nodes.    We have 88 nodes, and in just the last 10 hours we have had over 1000 alerts of rudder services restarting on 43 of the nodes.    This wasnt the case with Rudder v6.1.12. 


 I have attached various details and info to try and assist: 

 restart counts.txt 
  - a list of the restart counts received in the last 10 hours, showing the volumes arent consistent, not that it affects every node. 

 restart events.txt 
  - detailled breakdown of the last 1000 process restart timings.  


 Then, taking the current worst offender (vps001.bhs), I have the following files for the period 12:00-15:00 UTC today: 

 syslog, daemon.log (standard log files) 
 rudder.log (we direct all rudder and cf events to this log) 
 promise and cfengine logs for the persod around a couple of the restarts. 


 Environment Details for the agent: 
 <pre> 
 root@vps001:/var/log# lsb_release -a 
 No LSB modules are available. 
 Distributor ID: Debian 
 Description:      Debian GNU/Linux 11 (bullseye) 
 Release:          11 
 Codename:         bullseye 

 root@vps001:/var/log# cat /etc/apt/sources.list.d/rudder.list 
 deb https://repository.rudder.io/apt/7.0/ bullseye main 

 root@vps001:/var/log# dpkg -l | grep -i rudder 
 ii    rudder-agent                     7.0.0-debian11                   amd64          Configuration management and audit tool - agent 

 root@vps001:~# systemctl status rudder* 
 rudder-agent.service - Rudder agent umbrella service 
      Loaded: loaded (/lib/systemd/system/rudder-agent.service; enabled; vendor preset: enabled) 
      Active: active (exited) since Tue 2022-02-01 15:32:33 UTC; 6min ago 
        Docs: man:rudder(8) 
              https://docs.rudder.io 
     Process: 1492011 ExecStart=/bin/true (code=exited, status=0/SUCCESS) 
    Main PID: 1492011 (code=exited, status=0/SUCCESS) 
         CPU: 3ms 

 Feb 01 15:32:33 vps001 systemd[1]: Starting Rudder agent umbrella service... 
 Feb 01 15:32:33 vps001 systemd[1]: Finished Rudder agent umbrella service. 

 rudder-cf-serverd.service - CFEngine file server 
      Loaded: loaded (/lib/systemd/system/rudder-cf-serverd.service; enabled; vendor preset: enabled) 
      Active: active (running) since Tue 2022-02-01 15:32:33 UTC; 6min ago 
    Main PID: 1492017 (cf-serverd) 
       Tasks: 1 (limit: 1128) 
      Memory: 6.6M 
         CPU: 573ms 
      CGroup: /system.slice/rudder-cf-serverd.service 
              └─1492017 /opt/rudder/bin/cf-serverd --graceful-detach=600 --no-fork --inform 

 Feb 01 15:32:33 vps001 systemd[1]: Started CFEngine file server. 
 Feb 01 15:32:36 vps001 cf-serverd[1492017]:     notice: Server is starting... 
 Feb 01 15:32:36 vps001 cf-serverd[1492017]: CFEngine(server) rudder Server is starting... 

 rudder-cf-execd.service - CFEngine Execution Scheduler 
      Loaded: loaded (/lib/systemd/system/rudder-cf-execd.service; enabled; vendor preset: enabled) 
      Active: active (running) since Tue 2022-02-01 15:32:33 UTC; 6min ago 
    Main PID: 1492015 (cf-execd) 
       Tasks: 1 (limit: 1128) 
      Memory: 104.2M 
         CPU: 22.877s 
      CGroup: /system.slice/rudder-cf-execd.service 
              └─1492015 /opt/rudder/bin/cf-execd --no-fork 

 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sudoParameters@@result_success@@32377fd7-02fd-43d0-aab7-28460> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder Deleted file '/var/rudder/tmp/check_ssh_key_distribution//root.aut> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@sshKeyDistribution@@result_na@@32377fd7-02fd-43d0-aab7-28460a> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@Common@@result_na@@hasPolicyServer-root@@common-hasPolicyServ> 
 Feb 01 15:37:31 vps001 cf-agent[1494209]: CFEngine(agent) rudder R: @@Common@@control@@rudder@@run@@0@@end@@20220128-232623-d3d4d1b> 

 ls -l /lib/systemd/system/rudder* 
 -rw-r--r-- 1 root root 469 Nov 22    2017 /lib/systemd/system/rudder-agent.service 
 -rw-r--r-- 1 root root 512 Nov 22    2017 /lib/systemd/system/rudder-cf-execd.service 
 -rw-r--r-- 1 root root 522 Nov 22    2017 /lib/systemd/system/rudder-cf-serverd.service 

 </pre>  
 
 

Back