Project

General

Profile

Actions

Bug #22288

closed

Nodes could not update their policies anymore

Added by Nicolas CHARLES almost 2 years ago. Updated over 1 year ago.

Status:
Rejected
Priority:
N/A
Category:
Server components
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
I dislike using that feature
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Priority:
74
Name check:
To do
Fix check:
To do
Regression:
No

Description

I did a session of debug with rudder server debug <ip>, and stopped it
Later on, nodes could not update their policies

logs on the server side told

Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Deactivated successfully.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Scheduled restart job, restart counter is at 6.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: Stopped CFEngine file server.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Start request repeated too quickly.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Failed with result 'start-limit-hit'.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: Failed to start CFEngine file server.
root@15-237-108-211:/var/log/rudder/webapp# service rudder-cf-serverd status
× rudder-cf-serverd.service - CFEngine file server
     Loaded: loaded (/lib/systemd/system/rudder-cf-serverd.service; enabled; vendor preset: enabled)
     Active: failed (Result: start-limit-hit) since Fri 2023-01-20 10:57:33 UTC; 1h 50min ago
    Process: 708495 ExecStart=/opt/rudder/bin/cf-serverd --graceful-detach=600 --no-fork $VERBOSITY_OPTION (code=exited, status=0/SUCCESS)
   Main PID: 708495 (code=exited, status=0/SUCCESS)
        CPU: 318ms

Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Deactivated successfully.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Scheduled restart job, restart counter is at 6.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: Stopped CFEngine file server.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Start request repeated too quickly.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Failed with result 'start-limit-hit'.
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: Failed to start CFEngine file server.

it was on ubuntu22, rudder 7.2.3
restarting manually rudder-cf-serverd solved the issue

Actions #1

Updated by Vincent MEMBRÉ almost 2 years ago

  • Target version changed from 7.2.4 to 7.2.5
Actions #2

Updated by Benoît PECCATTE almost 2 years ago

  • Severity set to Minor - inconvenience | misleading | easy workaround
  • UX impact set to I dislike using that feature
  • User visibility set to Operational - other Techniques | Rudder settings | Plugins
  • Priority changed from 0 to 56
Actions #3

Updated by Alexis Mousset almost 2 years ago

  • Assignee set to Alexis Mousset
  • Severity changed from Minor - inconvenience | misleading | easy workaround to Major - prevents use of part of Rudder | no simple workaround
  • Priority changed from 56 to 75
Actions #4

Updated by Alexis Mousset almost 2 years ago

What the provided logs say:

  • A restart was triggered (either automatically or from outside)
  • restart counter is at 6 means that there were 6 restart in a StartLimitIntervalSec interval.
  • Apparently this reaches StartLimitBurst, and the service is not started (apparently the default is 5 restarts in a 10sec period)

Now the question that remains is why we have this many restarts:

  • Could be because cf-serverd is unable to start (e.g. port already used), and due to the Restart=always config it will restart in loop until reaching the limit. It seems to be the most likely option. We would need to see logs above the provided lines.
  • Could be because there are too many restarts triggered by the graceful restart. More logs would also help see if it's what happens.
Actions #5

Updated by Vincent MEMBRÉ almost 2 years ago

  • Target version changed from 7.2.5 to 7.2.6
  • Priority changed from 75 to 74
Actions #6

Updated by Alexis Mousset over 1 year ago

Closing, please reopen if this happens again.

Actions #7

Updated by Alexis Mousset over 1 year ago

  • Status changed from New to Rejected
Actions

Also available in: Atom PDF