Actions
Bug #25738
openRace condition when the agent run during a server/relay upgrade
Pull Request:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
To do
Regression:
No
Description
When upgrading a Rudder relay or server, if an agent is already running, it can end up trying to restart jetty or apache2 (at least) while they are being stopped by the package scripts.
Given how system works, this leads to making the stop operation fail, hence interrupting the packaging script and making the upgrade fail.
For example, the server install output:
Job for apache2.service canceled. ************************************************************************************** ERROR: rudder-webapp postinstall script failed ! Trying to recover the problem, you should check that your instance is properly working You may need to execute: apt-get install -f You should also try to manually execute: /opt/rudder/bin/rudder-upgrade Such errors should not happen, please open an issue for this problem on https://issues.rudder.io/projects/rudder/issues/new **************************************************************************************
As an illustration, if you have a stop action that takes a long time, then calling start while the stop job is running will with the default job mode (replace) cancel the installed/running stop job, and install a start job in the unit's job slot.
And indeed in the agent logs:
2024-10-25T10:05:25+00:00 R: @@rudder-service-apache@@result_repaired@@policy-server-root@@rudder-service-apache-root@@apache_started@@Apache service@@Started@@2024-10-25 10:05:09+00:00##root@#Ensure that service apache2 is running was repaired
Actions