Project

General

Profile

Actions

Bug #21170

closed

502 - Proxy Error when triggering Rudder agent

Added by Florent NEYRON almost 2 years ago. Updated 10 months ago.

Status:
Released
Priority:
N/A
Category:
Web - Nodes & inventories
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
I dislike using that feature
User visibility:
Infrequent - complex configurations | third party integrations
Effort required:
Priority:
54
Name check:
To do
Fix check:
To do
Regression:
No

Description

I often get this error when triggering Rudder agent


Files

Actions #1

Updated by Florent NEYRON almost 2 years ago

  • Category set to Agent
  • Target version set to 7.1.2
Actions #2

Updated by Alexis Mousset almost 2 years ago

  • Category changed from Agent to Web - Nodes & inventories
Actions #3

Updated by Alexis Mousset almost 2 years ago

  • Status changed from New to Rejected
  • User visibility set to Infrequent - complex configurations | third party integrations
  • Priority changed from 0 to 41

IIRC this was due to incorrect DNS configuration on our lab. Please reopen and give more information (webapp, apache and relayd logs) if the problem still occurs.

Actions #4

Updated by François ARMAND almost 2 years ago

  • Status changed from Rejected to New
  • Target version changed from 7.1.2 to 7.1.3
  • UX impact set to I dislike using that feature
  • Priority changed from 41 to 40

So, it is NOT a DNS problem. We are still triggering it once in a while when doing several changes and triggering remote run frequently.

Actions #5

Updated by Vincent MEMBRÉ almost 2 years ago

  • Target version changed from 7.1.3 to 7.1.4
Actions #6

Updated by François ARMAND almost 2 years ago

Not that we have hundreds of proxy errors log in /var/log/rudder/apache2

access.log-20220718:192.168.254.10 - - [18/Jul/2022:15:42:18 +0200] "GET /rudder/lift/comet/93744016213/F94197790062QGIY3U/F94197790988GIIXQT?F94197790071TTMR41=94197790074&F94197790076KFMU2D=94197791586&_=1658149891360 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/configurationManager/directiveManagement" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [18/Jul/2022:15:43:23 +0200] "GET /rudder/lift/comet/95682851597/F94197789640CHJK2L/F94197789811OJREZ1?F94197789813IGIHB3=94197789823&F94197789650C3DAA1=94197789653&F94197789654NGLSF3=94197789819&_=1658147745393 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.78 - - [18/Jul/2022:15:44:34 +0200] "GET /rudder/lift/comet/71018210693/F94197789640CHJK2L/F94197789811OJREZ1?F94197789813IGIHB3=94197789823&F94197789650C3DAA1=94197789653&F94197789654NGLSF3=94197789819&_=1658147745394 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:46:42 +0200] "GET /rudder/lift/comet/66271003563/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800927 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:47:52 +0200] "GET /rudder/lift/comet/90705451460/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800928 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:47:53 +0200] "GET /rudder/lift/comet/12093272947/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873068 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:49:02 +0200] "GET /rudder/lift/comet/4012806910/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800929 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:49:04 +0200] "GET /rudder/lift/comet/13128084336/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873069 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:50:13 +0200] "GET /rudder/lift/comet/37936390554/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800930 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:50:15 +0200] "GET /rudder/lift/comet/74695378557/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873070 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:51:24 +0200] "GET /rudder/lift/comet/94974027276/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800931 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:51:26 +0200] "GET /rudder/lift/comet/59544790107/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873071 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:52:34 +0200] "GET /rudder/lift/comet/47888502460/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800932 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:52:37 +0200] "GET /rudder/lift/comet/21869874776/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873072 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:53:45 +0200] "GET /rudder/lift/comet/61671153327/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800933 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:53:48 +0200] "GET /rudder/lift/comet/89693812004/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873073 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:55:46 +0200] "POST /rudder/secure/api/nodes/e71e186b-57b5-4bb0-bd3e-4e2c58189f04/applyPolicy HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:55:52 +0200] "GET /rudder/lift/comet/33439838180/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337693 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:55:52 +0200] "GET /rudder/lift/comet/50683585931/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800942 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:57:03 +0200] "GET /rudder/lift/comet/20069922140/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800943 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:57:03 +0200] "GET /rudder/lift/comet/6014671948/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337694 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:58:14 +0200] "GET /rudder/lift/comet/99651952362/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337695 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:58:14 +0200] "GET /rudder/lift/comet/3228984516/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800944 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:59:25 +0200] "GET /rudder/lift/comet/79868202730/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800945 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:59:25 +0200] "GET /rudder/lift/comet/91818095531/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337696 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:10:00:35 +0200] "GET /rudder/lift/comet/87613180946/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800946 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:10:00:36 +0200] "GET /rudder/lift/comet/97309445452/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337697 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:10:01:46 +0200] "GET /rudder/lift/comet/38792615777/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800947 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:10:01:47 +0200] "GET /rudder/lift/comet/95504318476/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337698 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
access.log-20220718:192.168.254.50 - - [19/Jul/2022:10:02:56 +0200] "GET /rudder/lift/comet/31352975345/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800948 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" 
access.log-20220718:192.168.254.78 - - [19/Jul/2022:10:02:58 +0200] "GET /rudder/lift/comet/71159395356/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337699 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
Actions #7

Updated by François ARMAND almost 2 years ago

(so there is a lot of problems, and particularly:)

access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:55:46 +0200] "POST /rudder/secure/api/nodes/e71e186b-57b5-4bb0-bd3e-4e2c58189f04/applyPolicy HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 

Actions #8

Updated by François ARMAND almost 2 years ago

We found two requests next one to the other where one succeeds and not the other (in /var/log/rudder/apache2)

The node is prod-app-02.lab.rudder.io with IP: 192.168.210.82

(1: error  ) access.log-20220718:192.168.254.78 - - [18/Jul/2022:11:55:27 +0200] "POST /rudder/secure/api/nodes/61012f79-d086-4bf3-8353-76db142af5ff/applyPolicy HTTP/1.1" 502 341   "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/61012f79-d086-4bf3-8353-76db142af5ff?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 
(2: success) access.log-20220718:192.168.254.78 - - [18/Jul/2022:11:56:34 +0200] "POST /rudder/secure/api/nodes/61012f79-d086-4bf3-8353-76db142af5ff/applyPolicy HTTP/1.1" 200 20480 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/61012f79-d086-4bf3-8353-76db142af5ff?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" 

We can see that in that the webapp correctly contacted relayd, but in the first case, relayd answer 502 (with a short message) and in the second case, it succeeds with a bigger message.

From relayd point of view (journald logs), we don't have anything interesting:

Jul 18 11:55:27 rudder sudo[449210]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- prod-app-02.lab.rudder.io
Jul 18 11:56:13 rudder rudder-relayd[2597810]:  INFO rudder_relayd: Configuration reload requested
Jul 18 11:56:13 rudder rudder-relayd[2597810]:  INFO rudder_relayd::data::node: Parsing nodes list from "/var/rudder/lib/relay/nodeslist.json" 
Jul 18 11:56:13 rudder rudder-relayd[2597810]:  INFO relayd::api: 127.0.0.1:52330 "POST /rudder/relay-api/1/system/reload HTTP/1.1" 200 "-" "curl/7.61.1" 1.167373ms
Jul 18 11:56:14 rudder rudder-relayd[2597810]:  INFO rudder_relayd: Configuration reload requested
Jul 18 11:56:14 rudder rudder-relayd[2597810]:  INFO rudder_relayd::data::node: Parsing nodes list from "/var/rudder/lib/relay/nodeslist.json" 
Jul 18 11:56:14 rudder rudder-relayd[2597810]:  INFO relayd::api: 127.0.0.1:52332 "POST /rudder/relay-api/1/system/reload HTTP/1.1" 200 "-" "curl/7.61.1" 851.776µs
Jul 18 11:56:14 rudder rudder-relayd[2597810]:  INFO relayd::api: 127.0.0.1:52316 "POST /rudder/relay-api/1/remote-run/nodes/61012f79-d086-4bf3-8353-76db142af5ff HTTP/1.1" 200 "-" "rudder/remote run query for node 6101>
Jul 18 11:56:34 rudder sudo[449946]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- prod-app-02.lab.rudder.io
Jul 18 11:56:44 rudder rudder-relayd[2597810]:  INFO rudder_relayd: Configuration reload requested
Jul 18 11:56:44 rudder rudder-relayd[2597810]:  INFO rudder_relayd::data::node: Parsing nodes list from "/var/rudder/lib/relay/nodeslist.json" 
Jul 18 11:56:44 rudder rudder-relayd[2597810]:  INFO relayd::api: 127.0.0.1:52344 "POST /rudder/relay-api/1/system/reload HTTP/1.1" 200 "-" "curl/7.61.1" 1.791531ms
Jul 18 11:56:44 rudder rudder-relayd[2597810]:  INFO rudder_relayd: Configuration reload requested

And in /var/log/messages, we can see that relayd call cf-runagent in each case:

Jul 18 11:55:27 rudder systemd[1]: Started Session c22300 of user root.
Jul 18 11:55:28 rudder cf-runagent[449324]: CFEngine(runagent) rudder ........................................................................
Jul 18 11:55:28 rudder cf-runagent[449324]: CFEngine(runagent) rudder Hailing prod-app-02.lab.rudder.io : 5309
Jul 18 11:55:28 rudder cf-runagent[449324]: CFEngine(runagent) rudder ........................................................................
Jul 18 11:55:28 rudder cf-serverd[2552098]: rudder     info: 192.168.210.82>  Accepting connection
Jul 18 11:55:28 rudder cf-serverd[2552098]: CFEngine(server) rudder 192.168.210.82>  Accepting connection
...
Jul 18 11:56:34 rudder systemd[1]: Started Session c22302 of user root.
Jul 18 11:56:35 rudder cf-runagent[450060]: CFEngine(runagent) rudder ........................................................................
Jul 18 11:56:35 rudder cf-runagent[450060]: CFEngine(runagent) rudder Hailing prod-app-02.lab.rudder.io : 5309
Jul 18 11:56:35 rudder cf-runagent[450060]: CFEngine(runagent) rudder ........................................................................
Jul 18 11:56:38 rudder cf-serverd[2552098]: rudder     info: 192.168.210.82>  Accepting connection
Jul 18 11:56:38 rudder cf-serverd[2552098]: CFEngine(server) rudder 192.168.210.82>  Accepting connection

It seems that in the error case, a local agent run on server is running between the start of the remote run and its end.

Actions #9

Updated by François ARMAND almost 2 years ago

Nope, [18/Jul/2022:11:53:24 +0200] is in error but there is no run on root at the same time in /var/log/messages

Actions #10

Updated by François ARMAND almost 2 years ago

In every cases the run seems to be correctly run on the remote node.
So it might be that either the communication between root cf-runagent and node cf-serverd is broken (but logs does not seem to match that), or something is not correct between cf-runagent out to stdout and its forward to relayd.

> we need to add logs in relayd around remote run request to trace what is happening.
> log level trace for relayd
Actions #11

Updated by François ARMAND almost 2 years ago

I was able to reproduce the problem on the second try, and it seems consistent: two try in a row makes the second one fails.

With, with relayd log level in trace, we can see that:

- when it works, we have something like:

Jul 19 18:11:58 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Starting remote run (asynchronous: false, keep_output: true)
Jul 19 18:11:58 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Next-hops: []
Jul 19 18:11:58 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Neighbors: [
Jul 19 18:11:58 rudder rudder-relayd[2597810]:     "prod-www-02.lab.rudder.io",
Jul 19 18:11:58 rudder rudder-relayd[2597810]: ]
Jul 19 18:11:58 rudder rudder-relayd[2597810]: TRACE remote-run: rudder_relayd::api::remote_run: Starting local remote run on [
Jul 19 18:11:58 rudder rudder-relayd[2597810]:     "prod-www-02.lab.rudder.io",
Jul 19 18:11:58 rudder rudder-relayd[2597810]: ] with RemoteRun {
Jul 19 18:11:58 rudder rudder-relayd[2597810]:     enabled: true,
Jul 19 18:11:58 rudder rudder-relayd[2597810]:     command: "/opt/rudder/bin/rudder",
Jul 19 18:11:58 rudder rudder-relayd[2597810]:     use_sudo: true,
Jul 19 18:11:58 rudder rudder-relayd[2597810]: }
Jul 19 18:11:58 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Remote run command: 'Command {
Jul 19 18:11:58 rudder rudder-relayd[2597810]:     std: "sudo" "/opt/rudder/bin/rudder" "remote" "run" "--" "prod-www-02.lab.rudder.io",
Jul 19 18:11:58 rudder rudder-relayd[2597810]:     kill_on_drop: false,
Jul 19 18:11:58 rudder rudder-relayd[2597810]: }'
Jul 19 18:11:58 rudder rudder-relayd[2597810]: TRACE remote-run: mio::poll: registering event source with poller: token=Token(1526726659), interests=READABLE | WRITABLE
Jul 19 18:11:58 rudder rudder-relayd[2597810]: TRACE hyper::proto::h1::conn: flushed({role=server}): State { reading: KeepAlive, writing: Init, keep_alive: Busy }
...
...
Jul 19 18:12:19 rudder rudder-relayd[2597810]: TRACE remote-run: mio::poll: deregistering event source from poller
Jul 19 18:12:19 rudder rudder-relayd[2597810]:  INFO relayd::api: 127.0.0.1:52108 "POST /rudder/relay-api/1/remote-run/nodes/6c199162-c23b-47f7-a27b-e0ef93994a09 HTTP/1.1" 200 "-" "rudder/remote run query for node 6c19>

- when it does not work, we have:

Jul 19 18:12:29 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Starting remote run (asynchronous: false, keep_output: true)
Jul 19 18:12:29 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Next-hops: []
Jul 19 18:12:29 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Neighbors: [
Jul 19 18:12:29 rudder rudder-relayd[2597810]:     "prod-www-02.lab.rudder.io",
Jul 19 18:12:29 rudder rudder-relayd[2597810]: ]
Jul 19 18:12:29 rudder rudder-relayd[2597810]: TRACE remote-run: rudder_relayd::api::remote_run: Starting local remote run on [
Jul 19 18:12:29 rudder rudder-relayd[2597810]:     "prod-www-02.lab.rudder.io",
Jul 19 18:12:29 rudder rudder-relayd[2597810]: ] with RemoteRun {
Jul 19 18:12:29 rudder rudder-relayd[2597810]:     enabled: true,
Jul 19 18:12:29 rudder rudder-relayd[2597810]:     command: "/opt/rudder/bin/rudder",
Jul 19 18:12:29 rudder rudder-relayd[2597810]:     use_sudo: true,
Jul 19 18:12:29 rudder rudder-relayd[2597810]: }
Jul 19 18:12:29 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Remote run command: 'Command {
Jul 19 18:12:29 rudder rudder-relayd[2597810]:     std: "sudo" "/opt/rudder/bin/rudder" "remote" "run" "--" "prod-www-02.lab.rudder.io",
Jul 19 18:12:29 rudder rudder-relayd[2597810]:     kill_on_drop: false,
Jul 19 18:12:29 rudder rudder-relayd[2597810]: }'
Jul 19 18:12:29 rudder rudder-relayd[2597810]: TRACE remote-run: mio::poll: registering event source with poller: token=Token(1543503875), interests=READABLE | WRITABLE
Jul 19 18:12:29 rudder rudder-relayd[2597810]: TRACE hyper::proto::h1::conn: flushed({role=server}): State { reading: KeepAlive, writing: Init, keep_alive: Busy }

[nothing more => connection never ending]

So something is not correctly happening with the end of the second connection.

Actions #12

Updated by Alexis Mousset almost 2 years ago

  • Assignee set to Alexis Mousset
Actions #13

Updated by Alexis Mousset almost 2 years ago

My findings so far:

  • The problem is easily reproducible by running several remote runs on the same node from the webapp
  • It is way harder to reproducde with API calls directly to apache
curl -v -X POST -k "https://127.0.0.1/rudder/relay-api/remote-run/nodes/6c199162-c23b-47f7-a27b-e0ef93994a09" --data keep_output=true  --data asynchronous=false

but is seems to be possible (but I'm not sure if it could be due to leftovers from webapp connections).

Actions #14

Updated by Alexis Mousset almost 2 years ago

Simpler reproduction:

  • start a remote run in the interface
  • run (to call relayd api directly)
curl -v -X POST -k "http://127.0.0.1:3030/rudder/relay-api/1/remote-run/nodes/6c199162-c23b-47f7-a27b-e0ef93994a09" --data keep_output=true  --data asynchronous=false
  • the webapp call fails with 502
  • Note: other parallel calls to relayd run normally
Actions #15

Updated by Alexis Mousset almost 2 years ago

relayd does not answer to the webapp when there are parallel calls:

# parallel calls from curl

juil. 20 16:58:35 rudder sudo[956787]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder
juil. 20 16:58:37 rudder sudo[957065]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder
juil. 20 16:59:21 rudder rudder-relayd[951486]:  INFO relayd::api: 127.0.0.1:45730 "POST /rudder/relay-api/1/remote-run/nodes/root HTTP/1.1" 200 "-" "curl/7.61.1" 45.446005693s
juil. 20 16:59:22 rudder rudder-relayd[951486]:  INFO relayd::api: 127.0.0.1:45734 "POST /rudder/relay-api/1/remote-run/nodes/root HTTP/1.1" 200 "-" "curl/7.61.1" 45.192579828s

# parallel calls from webapp and curl

juil. 20 16:59:30 rudder sudo[958023]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder
juil. 20 16:59:32 rudder sudo[958319]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder
juil. 20 17:00:16 rudder rudder-relayd[951486]:  INFO relayd::api: 127.0.0.1:45782 "POST /rudder/relay-api/1/remote-run/nodes/root HTTP/1.1" 200 "-" "curl/7.61.1" 44.336092093s

# isolated call from webapp

juil. 20 17:01:49 rudder sudo[960519]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder
juil. 20 17:02:16 rudder rudder-relayd[951486]:  INFO relayd::api: 127.0.0.1:45850 "POST /rudder/relay-api/1/remote-run/nodes/root HTTP/1.1" 200 "-" "rudder/remote run query for node root" 27.598727273s
Actions #16

Updated by Alexis Mousset almost 2 years ago

  • Status changed from New to In progress
Actions #17

Updated by Alexis Mousset almost 2 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Alexis Mousset to François ARMAND
  • Pull Request set to https://github.com/Normation/rudder/pull/4379
Actions #18

Updated by Alexis Mousset almost 2 years ago

Ok, so:

  • The core of the problem is remote-runs taking more than 60 seconds, hitting httpd default timeout and causing 502 error
  • It happens when running remote-runs in parallel on the same node (especially on our demo platform where nodes have 1 core)
  • Setting apache timeout to a higher value prevents the 502 error on calls to remote-run through apache, but the webapp has timeouts too
[2022-07-21 02:08:11+0200] ERROR nodes - Error occured when contacting internal remote-run API to apply classes on Node 'root': SystemError: An error occured; cause was: java.net.SocketTimeoutException: Read timed out
  • Setting a higher timeout value in the http client removes the java timeout exception, but it does not seem enough to see the result in the interface.
Actions #19

Updated by Alexis Mousset almost 2 years ago

  • Status changed from Pending technical review to Pending release
Actions #20

Updated by Alexis Mousset over 1 year ago

  • Target version changed from 7.1.4 to 7.1.3
Actions #21

Updated by Alexis Mousset over 1 year ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 7.1.3 which was released today.

Actions #22

Updated by Nicolas CHARLES 10 months ago

  • Priority changed from 40 to 54
  • Regression set to No

This should be done in /opt/rudder/etc/rudder-apache-webapp-ssl.conf as well

Actions

Also available in: Atom PDF