Project

General

Profile

Actions

Bug #25738

open

Race condition when the agent run during a server/relay upgrade

Added by Alexis Mousset 28 days ago. Updated 15 days ago.

Status:
New
Priority:
N/A
Assignee:
-
Category:
Packaging
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
To do
Regression:
No

Description

When upgrading a Rudder relay or server, if an agent is already running, it can end up trying to restart jetty or apache2 (at least) while they are being stopped by the package scripts.

Given how system works, this leads to making the stop operation fail, hence interrupting the packaging script and making the upgrade fail.

For example, the server install output:

Job for apache2.service canceled.
**************************************************************************************
ERROR: rudder-webapp postinstall script failed !

Trying to recover the problem, you should check that your instance is properly working

You may need to execute: apt-get install -f
You should also try to manually execute: /opt/rudder/bin/rudder-upgrade

   Such errors should not happen, please open an issue for this problem on 
            https://issues.rudder.io/projects/rudder/issues/new
**************************************************************************************

As an illustration, if you have a stop action that takes a long time, then calling start while the stop job is running will with the default job mode (replace) cancel the installed/running stop job, and install a start job in the unit's job slot.

(https://serverfault.com/questions/936220/what-could-cause-a-systemd-service-stop-to-end-with-job-being-canceled)

And indeed in the agent logs:

2024-10-25T10:05:25+00:00 R: @@rudder-service-apache@@result_repaired@@policy-server-root@@rudder-service-apache-root@@apache_started@@Apache service@@Started@@2024-10-25 10:05:09+00:00##root@#Ensure that service apache2 is running  was repaired

Related issues 1 (0 open1 closed)

Related to Rudder - Bug #24559: When upgrading from 8.0 to 8.1 on Ubuntu 22, the webapp doesn't startResolvedActions
Actions

Also available in: Atom PDF