Project

General

Profile

Actions

Bug #27125

open

Rudder package method stuck for 60 hours on Debian 12

Added by Michel BOUISSOU 2 months ago. Updated 16 days ago.

Status:
New
Priority:
1 (highest)
Category:
Generic methods
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
To do
Regression:
No

Description

Server and node Rudder 8.3.2, Debian 12 x86_64 node.

I noticed that the “system updates campaigns“ were not running on a node, showing that apt was locked by another process.

Checking PS I foudn out that an “apt-get update” started by

python3 /var/rudder/cfengine-community/modules/packages/apt-get list-updates

had been running from ~60 hours and was obviously stuck without ever being terminated.

This doesn't seem to come from the system updates campaigns but rather from a “package” method started by the Rudder agent.

Agent execution output in /var/rudder/cfengine-community/outputs, corresponding to the agent run that started the stuck process, is truncated and ends in :

2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@log_info@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@5de65f37-e442-4201-bc7d-567125891616@@10-050 - Rudder repository set to download.rudder.io for debian-family OS with Rudder 8.3@@/etc/apt/sources.list.d/rudder.list@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Build file /etc/apt/sources.list.d/rudder.list from mustache type template /var/rudder/cfengine-community/inputs/upgrade_rudder_agent_on_debian_family/1.0/resources/rudder.list.tpl was correct
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@result_success@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@5de65f37-e442-4201-bc7d-567125891616@@10-050 - Rudder repository set to download.rudder.io for debian-family OS with Rudder 8.3@@/etc/apt/sources.list.d/rudder.list@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Build file /etc/apt/sources.list.d/rudder.list from mustache template /var/rudder/cfengine-community/inputs/upgrade_rudder_agent_on_debian_family/1.0/resources/rudder.list.tpl was correct
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@log_info@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@10f780e3-d11d-41a4-bb69-912d3d707935@@10-060 - Rudder repository file permissions@@/etc/apt/sources.list.d/rudder.list@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Ensure permissions mode 644, owner root and group root on /etc/apt/sources.list.d/rudder.list on type all with 0 recursion level was correct
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@result_success@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@10f780e3-d11d-41a4-bb69-912d3d707935@@10-060 - Rudder repository file permissions@@/etc/apt/sources.list.d/rudder.list@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Ensure permissions mode 644, owner root and group root on /etc/apt/sources.list.d/rudder.list was correct
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@result_na@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@f1673845-2d12-4974-ab45-98dd3b6f0a2d@@10-070-10 - Rudder repository credentials file absent for public repository@@/etc/apt/auth.conf.d/rudder.conf@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Skipping method 'File absent' with key parameter '/etc/apt/auth.conf.d/rudder.conf' since condition 'debian.(!debian_rudder_password_provided_true)' is not reached was not applicable
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@result_success@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@ea37b528-0a4e-4269-ade3-1ab78cd66cb3@@10-070-20-10 - Rudder repository credentials file present@@/etc/apt/auth.conf.d/rudder.conf@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Presence of file /etc/apt/auth.conf.d/rudder.conf was correct
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@result_success@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@cee42f3b-7e17-4478-9540-07d5bd8c5d36@@10-070-20-20 - Rudder repository credentials file content@@/etc/apt/auth.conf.d/rudder.conf@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Insert content into /etc/apt/auth.conf.d/rudder.conf was correct
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@log_info@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@2f1ce6fc-6eb0-437a-aaca-2eabe1c8ae27@@10-070-20-30 - Rudder repository credentials file permissions@@/etc/apt/auth.conf.d/rudder.conf@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Ensure permissions mode 640, owner root and group root on /etc/apt/auth.conf.d/rudder.conf on type all with 0 recursion level was correct
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@result_success@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@2f1ce6fc-6eb0-437a-aaca-2eabe1c8ae27@@10-070-20-30 - Rudder repository credentials file permissions@@/etc/apt/auth.conf.d/rudder.conf@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Ensure permissions mode 640, owner root and group root on /etc/apt/auth.conf.d/rudder.conf was correct
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@result_na@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@6738eedc-9fea-4a3f-9f9f-4e3e856d3e55@@10-080 - Update packages list@@apt update@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Skipping method 'Command execution' with key parameter 'apt update' since condition 'debian.(file_from_template__etc_apt_sources_list_d_rudder_list_repaired|file_lines_present__etc_apt_auth_conf_d_rudder_conf_repaired)' is not reached was not applicable
2025-06-16T18:05:13+00:00 R: @@upgrade_rudder_agent_on_debian_family@@result_na@@4a0d9768-070d-4e5c-8153-d119da3706e8@@8a0e373b-d8dc-41f2-a91a-34e52df832c7@@4060dd81-5fb7-4c3e-870b-891c1a9f4b2a@@10-090 - Purge available packages cache@@rm -f /var/rudder/cfengine-community/state/packages_updates*@@2025-06-16 18:05:08+00:00##872d703d-2fd2-4641-b3f8-0f98d0f7226b@#Skipping method 'Command execution' with key parameter 'rm -f /var/rudder/cfengine-community/state/packages_updates*' since condition 'debian.(command_execution_apt_update_kept|command_execution_apt_update_repaired)' is not reached was not applicable
cf-execd: timeout waiting for output from agent (agent_expireafter=120) - terminating it

Files

Rudder_update_bloque_250619a_t.png (37.4 KB) Rudder_update_bloque_250619a_t.png Stuck process Michel BOUISSOU, 2025-06-19 09:23
Rudder_update_bloque_250619b.png (19.3 KB) Rudder_update_bloque_250619b.png Process has been running ~60 hours Michel BOUISSOU, 2025-06-19 09:23
Actions #1

Updated by François ARMAND 2 months ago

  • Assignee set to Benoît PECCATTE
  • Priority changed from To review to 1 (highest)
Actions #2

Updated by Félix DALLIDET about 2 months ago

  • Target version changed from 8.3.3 to 8.3.4
Actions #3

Updated by Alexis Mousset 16 days ago

  • Assignee changed from Benoît PECCATTE to Alexis Mousset
Actions #4

Updated by Alexis Mousset 16 days ago

The code handling the agent timeout:

        if (!IsReadReady(fileno(pp),
                         config->agent_expireafter * SECONDS_PER_MINUTE))
        {
            char errmsg[] =
                "cf-execd: timeout waiting for output from agent" 
                " (agent_expireafter=%d) - terminating it\n";

            fprintf(fp, errmsg, config->agent_expireafter);
            /* Trim '\n' before Log()ing. */
            errmsg[strlen(errmsg) - 1] = '\0';
            Log(LOG_LEVEL_NOTICE, errmsg, config->agent_expireafter);
            count++;

            pid_t pid_shell;

            if (PipeToPid(&pid_shell, pp))
            {
                /* Default to killing the shell process (if we fail to get
                 * more precise target). */
                pid_t pid_to_kill = pid_shell;

#ifndef __MINGW32__
                /* The agent command is executed in a shell. Trying to kill the
                 * shell may end up sending it SIGKILL which is not propagated
                 * to the subprocesses of the shell and thus the cf-agent
                 * process. The shell, however, creates a new process group
                 * (with the PGID equal to the PID of the child process) for the
                 * agent which then allows us to kill the whole process group
                 * here. */

                /* We need to determine the PID of the agent (and thus its
                 * process group) first.*/
                ClearProcessTable();
                if (LoadProcessTable())
                {
                    ProcessSelect ps = PROCESS_SELECT_INIT;
                    ps.min_ppid = pid_shell;
                    ps.max_ppid = pid_shell;
                    Item *procs = SelectProcesses(".*" /* any command */, &ps, true /* apply ps */);
                    if (procs != NULL)
                    {
                        pid_to_kill = procs->counter;

                        /* There should only be one child process of the shell
                         * running by default. But it doesn't apply to all
                         * values of exec_command in general. */
                        assert(procs->next == NULL);
                    }
                }

                /* kill(-pid) is actually kill(pgid=pid) and kill(pid, 0) just
                 * checks if it's possible to send signals to pid (or pgid in
                 * our case). Kill the whole process group if possible. */
                if ((getpgid(pid_to_kill) == pid_to_kill)
                    && (kill(-pid_to_kill, 0) == 0))
                {
                    pid_to_kill = -pid_to_kill;
                }
#endif
                ProcessSignalTerminate(pid_to_kill);
            }
            else
            {
                Log(LOG_LEVEL_ERR, "Could not get PID of agent");
            }

            break;
        }
Actions

Also available in: Atom PDF