Actions
Bug #22288
closedNodes could not update their policies anymore
Pull Request:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
I dislike using that feature
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Priority:
74
Name check:
To do
Fix check:
To do
Regression:
No
Description
I did a session of debug with rudder server debug <ip>, and stopped it
Later on, nodes could not update their policies
logs on the server side told
Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Deactivated successfully. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Scheduled restart job, restart counter is at 6. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: Stopped CFEngine file server. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Start request repeated too quickly. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Failed with result 'start-limit-hit'. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: Failed to start CFEngine file server. root@15-237-108-211:/var/log/rudder/webapp# service rudder-cf-serverd status × rudder-cf-serverd.service - CFEngine file server Loaded: loaded (/lib/systemd/system/rudder-cf-serverd.service; enabled; vendor preset: enabled) Active: failed (Result: start-limit-hit) since Fri 2023-01-20 10:57:33 UTC; 1h 50min ago Process: 708495 ExecStart=/opt/rudder/bin/cf-serverd --graceful-detach=600 --no-fork $VERBOSITY_OPTION (code=exited, status=0/SUCCESS) Main PID: 708495 (code=exited, status=0/SUCCESS) CPU: 318ms Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Deactivated successfully. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Scheduled restart job, restart counter is at 6. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: Stopped CFEngine file server. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Start request repeated too quickly. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: rudder-cf-serverd.service: Failed with result 'start-limit-hit'. Jan 20 10:57:33 15-237-108-211.training.rudder.io systemd[1]: Failed to start CFEngine file server.
it was on ubuntu22, rudder 7.2.3
restarting manually rudder-cf-serverd solved the issue
Updated by Vincent MEMBRÉ almost 2 years ago
- Target version changed from 7.2.4 to 7.2.5
Updated by Benoît PECCATTE almost 2 years ago
- Severity set to Minor - inconvenience | misleading | easy workaround
- UX impact set to I dislike using that feature
- User visibility set to Operational - other Techniques | Rudder settings | Plugins
- Priority changed from 0 to 56
Updated by Alexis Mousset over 1 year ago
- Assignee set to Alexis Mousset
- Severity changed from Minor - inconvenience | misleading | easy workaround to Major - prevents use of part of Rudder | no simple workaround
- Priority changed from 56 to 75
Updated by Alexis Mousset over 1 year ago
What the provided logs say:
- A restart was triggered (either automatically or from outside)
restart counter is at 6
means that there were 6 restart in aStartLimitIntervalSec
interval.- Apparently this reaches
StartLimitBurst
, and the service is not started (apparently the default is 5 restarts in a 10sec period)
Now the question that remains is why we have this many restarts:
- Could be because
cf-serverd
is unable to start (e.g. port already used), and due to theRestart=always
config it will restart in loop until reaching the limit. It seems to be the most likely option. We would need to see logs above the provided lines. - Could be because there are too many restarts triggered by the graceful restart. More logs would also help see if it's what happens.
Updated by Vincent MEMBRÉ over 1 year ago
- Target version changed from 7.2.5 to 7.2.6
- Priority changed from 75 to 74
Updated by Alexis Mousset over 1 year ago
Closing, please reopen if this happens again.
Actions