Project

General

Profile

Actions

Bug #19547

closed

Relayd deadlocks when trying to forward a shared-file

Added by Alexis Mousset over 3 years ago. Updated over 3 years ago.

Status:
Released
Priority:
N/A
Category:
Relay server or API
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
Checked
Regression:

Description

Relayd stops working when the api is called with a shared-files request requiring to forward the file upstream.

After the request all API requests timeout and the service won't stop with a sigterm.

This happens in 6.1.14 (and probably older releases).

Actions #1

Updated by Alexis Mousset over 3 years ago

  • Status changed from New to In progress
  • Assignee set to Alexis Mousset
Actions #2

Updated by Alexis Mousset over 3 years ago

on the relay receiving the request:

 TRACE hyper::proto::h1::conn: Conn::read_head
 DEBUG hyper::proto::h1::io: read 442 bytes
 TRACE hyper::proto::h1::role: Request.parse([Header; 100], [u8; 442])
 TRACE hyper::proto::h1::role: Request.parse Complete(442)
 DEBUG hyper::proto::h1::io: parsed 7 headers
 DEBUG hyper::proto::h1::conn: incoming body is empty
 TRACE warp::filters::path: "rudder"?: "rudder" 
 TRACE warp::filters::path: "relay-api"?: "relay-api" 
 TRACE warp::filters::path: "1"?: "1" 
 TRACE warp::filters::path: "system"?: "shared-files" 
 TRACE warp::filters::path: "remote-run"?: "shared-files" 
 TRACE warp::filters::path: "shared-files"?: "shared-files" 
 TRACE warp::filters::method: method::PUT?: HEAD
 TRACE warp::filters::method: method::HEAD?: HEAD
 TRACE warp::filters::path: param?: "4756c0ec-b7f5-4046-8e6c-7e185bbba256" 
 TRACE warp::filters::path: param?: "4db80a96-a2cf-43f1-b4da-c75deaa8dc01" 
 TRACE warp::filters::path: param?: "toto" 
 TRACE shared_files_head{target_id=4756c0ec-b7f5-4046-8e6c-7e185bbba256 source_id=4db80a96-a2cf-43f1-b4da-c75deaa8dc01 file_id=toto}: hyper::client::pool: checkout waiting for idle connection: "https://server" 
 TRACE shared_files_head{target_id=4756c0ec-b7f5-4046-8e6c-7e185bbba256 source_id=4db80a96-a2cf-43f1-b4da-c75deaa8dc01 file_id=toto}: hyper::client::connect::http: Http::connect; scheme=https, host=server, port=None
 DEBUG hyper::client::connect::dns: resolving host="server" 
 DEBUG shared_files_head{target_id=4756c0ec-b7f5-4046-8e6c-7e185bbba256 source_id=4db80a96-a2cf-43f1-b4da-c75deaa8dc01 file_id=toto}: hyper::client::connect::http: connecting to 192.168.5.2:443
 DEBUG shared_files_head{target_id=4756c0ec-b7f5-4046-8e6c-7e185bbba256 source_id=4db80a96-a2cf-43f1-b4da-c75deaa8dc01 file_id=toto}: tokio_reactor: adding I/O source: 33554440
 TRACE shared_files_head{target_id=4756c0ec-b7f5-4046-8e6c-7e185bbba256 source_id=4db80a96-a2cf-43f1-b4da-c75deaa8dc01 file_id=toto}: mio::poll: registering with poller
 DEBUG shared_files_head{target_id=4756c0ec-b7f5-4046-8e6c-7e185bbba256 source_id=4db80a96-a2cf-43f1-b4da-c75deaa8dc01 file_id=toto}: tokio_reactor::registration: scheduling Write for: 8

on the relay's policy server:

juil. 21 13:10:44 server rudder-relayd[1043]: INFO relayd::relay-api: 127.0.0.1:58642 "HEAD /rudder/relay-api/1/shared-files/4756c0ec-b7f5-4046-8e6c-7e185bbba256/root/toto HTTP/1.1" 200 "-" "curl/7.61.0" 843.172µs

on the agent:

# /opt/rudder/bin/curl https://relay/rudder/relay-api/shared-files/4756c0ec-b7f5-4046-8e6c-7e185bbba256/4db80a96-a2cf-43f1-b4da-c75deaa8dc01/toto?hash=28d3ee069a6e69a679e08f1505bc4bc0dcc612c4e2292dccfa8afb9ccc6f45039060ce698f980954a82c436fd481d413c44fe9735472db19ce4d3f22283e4f1d --insecure --tlsv1 --location --head
HTTP/1.1 502 Proxy Error
Date: Wed, 21 Jul 2021 13:10:50 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips
Content-Type: text/html; charset=iso-8859-1
Actions #3

Updated by Alexis Mousset over 3 years ago

  • Description updated (diff)
Actions #4

Updated by Alexis Mousset over 3 years ago

Relayd:

# strace -fp 17920
strace: Process 17920 attached with 7 threads
[pid 17926] futex(0x56182030cb64, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
[pid 17925] futex(0x7fb0bc00fc84, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished ...>
[pid 17924] futex(0x56182035af90, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 17923] futex(0x56182035af90, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 17922] futex(0x56182035af90, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 17921] futex(0x56182035b064, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished ...>
[pid 17920] futex(0x5618203650a4, FUTEX_WAIT_PRIVATE, 1, NULL

all threads are stuck on FUTEX_WAIT.

Actions #5

Updated by Alexis Mousset over 3 years ago

Also happens with up-to-date system and up-to-date relayd dependencies.

Actions #6

Updated by Alexis Mousset over 3 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Alexis Mousset to François ARMAND
  • Pull Request set to https://github.com/Normation/rudder/pull/3735
Actions #7

Updated by Alexis Mousset over 3 years ago

  • Status changed from Pending technical review to Pending release
Actions #8

Updated by François ARMAND over 3 years ago

  • Fix check changed from To do to Checked
Actions #9

Updated by Vincent MEMBRÉ over 3 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 6.1.16 and 6.2.10 which were released today.

Actions

Also available in: Atom PDF