Bug #21170
closed502 - Proxy Error when triggering Rudder agent
Description
I often get this error when triggering Rudder agent
Files
Updated by Florent NEYRON over 2 years ago
- Category set to Agent
- Target version set to 7.1.2
Updated by Alexis Mousset over 2 years ago
- Category changed from Agent to Web - Nodes & inventories
Updated by Alexis Mousset over 2 years ago
- Status changed from New to Rejected
- User visibility set to Infrequent - complex configurations | third party integrations
- Priority changed from 0 to 41
IIRC this was due to incorrect DNS configuration on our lab. Please reopen and give more information (webapp, apache and relayd logs) if the problem still occurs.
Updated by François ARMAND over 2 years ago
- Status changed from Rejected to New
- Target version changed from 7.1.2 to 7.1.3
- UX impact set to I dislike using that feature
- Priority changed from 41 to 40
So, it is NOT a DNS problem. We are still triggering it once in a while when doing several changes and triggering remote run frequently.
Updated by Vincent MEMBRÉ over 2 years ago
- Target version changed from 7.1.3 to 7.1.4
Updated by François ARMAND over 2 years ago
Not that we have hundreds of proxy errors log in /var/log/rudder/apache2
access.log-20220718:192.168.254.10 - - [18/Jul/2022:15:42:18 +0200] "GET /rudder/lift/comet/93744016213/F94197790062QGIY3U/F94197790988GIIXQT?F94197790071TTMR41=94197790074&F94197790076KFMU2D=94197791586&_=1658149891360 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/configurationManager/directiveManagement" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [18/Jul/2022:15:43:23 +0200] "GET /rudder/lift/comet/95682851597/F94197789640CHJK2L/F94197789811OJREZ1?F94197789813IGIHB3=94197789823&F94197789650C3DAA1=94197789653&F94197789654NGLSF3=94197789819&_=1658147745393 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.78 - - [18/Jul/2022:15:44:34 +0200] "GET /rudder/lift/comet/71018210693/F94197789640CHJK2L/F94197789811OJREZ1?F94197789813IGIHB3=94197789823&F94197789650C3DAA1=94197789653&F94197789654NGLSF3=94197789819&_=1658147745394 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:46:42 +0200] "GET /rudder/lift/comet/66271003563/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800927 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:47:52 +0200] "GET /rudder/lift/comet/90705451460/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800928 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:47:53 +0200] "GET /rudder/lift/comet/12093272947/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873068 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:49:02 +0200] "GET /rudder/lift/comet/4012806910/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800929 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:49:04 +0200] "GET /rudder/lift/comet/13128084336/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873069 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:50:13 +0200] "GET /rudder/lift/comet/37936390554/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800930 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:50:15 +0200] "GET /rudder/lift/comet/74695378557/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873070 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:51:24 +0200] "GET /rudder/lift/comet/94974027276/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800931 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:51:26 +0200] "GET /rudder/lift/comet/59544790107/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873071 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:52:34 +0200] "GET /rudder/lift/comet/47888502460/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800932 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:52:37 +0200] "GET /rudder/lift/comet/21869874776/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873072 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:53:45 +0200] "GET /rudder/lift/comet/61671153327/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772233&_=1658216800933 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:53:48 +0200] "GET /rudder/lift/comet/89693812004/F13873897723300S2HU0/F1387389772505S55JVA?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772508&_=1658216873073 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/nodes" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:55:46 +0200] "POST /rudder/secure/api/nodes/e71e186b-57b5-4bb0-bd3e-4e2c58189f04/applyPolicy HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:55:52 +0200] "GET /rudder/lift/comet/33439838180/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337693 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:55:52 +0200] "GET /rudder/lift/comet/50683585931/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800942 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:57:03 +0200] "GET /rudder/lift/comet/20069922140/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800943 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:57:03 +0200] "GET /rudder/lift/comet/6014671948/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337694 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:58:14 +0200] "GET /rudder/lift/comet/99651952362/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337695 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:58:14 +0200] "GET /rudder/lift/comet/3228984516/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800944 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.50 - - [19/Jul/2022:09:59:25 +0200] "GET /rudder/lift/comet/79868202730/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800945 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:59:25 +0200] "GET /rudder/lift/comet/91818095531/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337696 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:10:00:35 +0200] "GET /rudder/lift/comet/87613180946/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800946 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:10:00:36 +0200] "GET /rudder/lift/comet/97309445452/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337697 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:10:01:46 +0200] "GET /rudder/lift/comet/38792615777/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800947 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:10:01:47 +0200] "GET /rudder/lift/comet/95504318476/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337698 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" access.log-20220718:192.168.254.50 - - [19/Jul/2022:10:02:56 +0200] "GET /rudder/lift/comet/31352975345/F1387389772208JXUYJF/F138738977220933BOR5?F1387389772211BOMGUO=1387389772222&F1387389772218BEAHWU=1387389772221&F1387389772223MT5IPU=1387389772881&_=1658216800948 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/index.html" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" access.log-20220718:192.168.254.78 - - [19/Jul/2022:10:02:58 +0200] "GET /rudder/lift/comet/71159395356/F13873897723300S2HU0/F13873897725974DUIUY?F1387389772341YMQBDZ=1387389772344&F13873897723451STXLF=1387389772882&_=1658217337699 HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
Updated by François ARMAND over 2 years ago
(so there is a lot of problems, and particularly:)
access.log-20220718:192.168.254.78 - - [19/Jul/2022:09:55:46 +0200] "POST /rudder/secure/api/nodes/e71e186b-57b5-4bb0-bd3e-4e2c58189f04/applyPolicy HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/e71e186b-57b5-4bb0-bd3e-4e2c58189f04?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
Updated by François ARMAND over 2 years ago
We found two requests next one to the other where one succeeds and not the other (in /var/log/rudder/apache2)
The node is prod-app-02.lab.rudder.io with IP: 192.168.210.82
(1: error ) access.log-20220718:192.168.254.78 - - [18/Jul/2022:11:55:27 +0200] "POST /rudder/secure/api/nodes/61012f79-d086-4bf3-8353-76db142af5ff/applyPolicy HTTP/1.1" 502 341 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/61012f79-d086-4bf3-8353-76db142af5ff?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" (2: success) access.log-20220718:192.168.254.78 - - [18/Jul/2022:11:56:34 +0200] "POST /rudder/secure/api/nodes/61012f79-d086-4bf3-8353-76db142af5ff/applyPolicy HTTP/1.1" 200 20480 "https://rudder.lab.rudder.io/rudder/secure/nodeManager/node/61012f79-d086-4bf3-8353-76db142af5ff?displayCompliance=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
We can see that in that the webapp correctly contacted relayd, but in the first case, relayd answer 502 (with a short message) and in the second case, it succeeds with a bigger message.
From relayd point of view (journald logs), we don't have anything interesting:
Jul 18 11:55:27 rudder sudo[449210]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- prod-app-02.lab.rudder.io Jul 18 11:56:13 rudder rudder-relayd[2597810]: INFO rudder_relayd: Configuration reload requested Jul 18 11:56:13 rudder rudder-relayd[2597810]: INFO rudder_relayd::data::node: Parsing nodes list from "/var/rudder/lib/relay/nodeslist.json" Jul 18 11:56:13 rudder rudder-relayd[2597810]: INFO relayd::api: 127.0.0.1:52330 "POST /rudder/relay-api/1/system/reload HTTP/1.1" 200 "-" "curl/7.61.1" 1.167373ms Jul 18 11:56:14 rudder rudder-relayd[2597810]: INFO rudder_relayd: Configuration reload requested Jul 18 11:56:14 rudder rudder-relayd[2597810]: INFO rudder_relayd::data::node: Parsing nodes list from "/var/rudder/lib/relay/nodeslist.json" Jul 18 11:56:14 rudder rudder-relayd[2597810]: INFO relayd::api: 127.0.0.1:52332 "POST /rudder/relay-api/1/system/reload HTTP/1.1" 200 "-" "curl/7.61.1" 851.776µs Jul 18 11:56:14 rudder rudder-relayd[2597810]: INFO relayd::api: 127.0.0.1:52316 "POST /rudder/relay-api/1/remote-run/nodes/61012f79-d086-4bf3-8353-76db142af5ff HTTP/1.1" 200 "-" "rudder/remote run query for node 6101> Jul 18 11:56:34 rudder sudo[449946]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- prod-app-02.lab.rudder.io Jul 18 11:56:44 rudder rudder-relayd[2597810]: INFO rudder_relayd: Configuration reload requested Jul 18 11:56:44 rudder rudder-relayd[2597810]: INFO rudder_relayd::data::node: Parsing nodes list from "/var/rudder/lib/relay/nodeslist.json" Jul 18 11:56:44 rudder rudder-relayd[2597810]: INFO relayd::api: 127.0.0.1:52344 "POST /rudder/relay-api/1/system/reload HTTP/1.1" 200 "-" "curl/7.61.1" 1.791531ms Jul 18 11:56:44 rudder rudder-relayd[2597810]: INFO rudder_relayd: Configuration reload requested
And in /var/log/messages, we can see that relayd call cf-runagent
in each case:
Jul 18 11:55:27 rudder systemd[1]: Started Session c22300 of user root. Jul 18 11:55:28 rudder cf-runagent[449324]: CFEngine(runagent) rudder ........................................................................ Jul 18 11:55:28 rudder cf-runagent[449324]: CFEngine(runagent) rudder Hailing prod-app-02.lab.rudder.io : 5309 Jul 18 11:55:28 rudder cf-runagent[449324]: CFEngine(runagent) rudder ........................................................................ Jul 18 11:55:28 rudder cf-serverd[2552098]: rudder info: 192.168.210.82> Accepting connection Jul 18 11:55:28 rudder cf-serverd[2552098]: CFEngine(server) rudder 192.168.210.82> Accepting connection ... Jul 18 11:56:34 rudder systemd[1]: Started Session c22302 of user root. Jul 18 11:56:35 rudder cf-runagent[450060]: CFEngine(runagent) rudder ........................................................................ Jul 18 11:56:35 rudder cf-runagent[450060]: CFEngine(runagent) rudder Hailing prod-app-02.lab.rudder.io : 5309 Jul 18 11:56:35 rudder cf-runagent[450060]: CFEngine(runagent) rudder ........................................................................ Jul 18 11:56:38 rudder cf-serverd[2552098]: rudder info: 192.168.210.82> Accepting connection Jul 18 11:56:38 rudder cf-serverd[2552098]: CFEngine(server) rudder 192.168.210.82> Accepting connection
It seems that in the error case, a local agent run on server is running between the start of the remote run and its end.
Updated by François ARMAND over 2 years ago
Nope, [18/Jul/2022:11:53:24 +0200]
is in error but there is no run on root at the same time in /var/log/messages
Updated by François ARMAND over 2 years ago
In every cases the run seems to be correctly run on the remote node.
So it might be that either the communication between root cf-runagent and node cf-serverd is broken (but logs does not seem to match that), or something is not correct between cf-runagent out to stdout and its forward to relayd.
> log level trace for relayd
Updated by François ARMAND over 2 years ago
I was able to reproduce the problem on the second try, and it seems consistent: two try in a row makes the second one fails.
With, with relayd log level in trace, we can see that:
- when it works, we have something like:
Jul 19 18:11:58 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Starting remote run (asynchronous: false, keep_output: true) Jul 19 18:11:58 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Next-hops: [] Jul 19 18:11:58 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Neighbors: [ Jul 19 18:11:58 rudder rudder-relayd[2597810]: "prod-www-02.lab.rudder.io", Jul 19 18:11:58 rudder rudder-relayd[2597810]: ] Jul 19 18:11:58 rudder rudder-relayd[2597810]: TRACE remote-run: rudder_relayd::api::remote_run: Starting local remote run on [ Jul 19 18:11:58 rudder rudder-relayd[2597810]: "prod-www-02.lab.rudder.io", Jul 19 18:11:58 rudder rudder-relayd[2597810]: ] with RemoteRun { Jul 19 18:11:58 rudder rudder-relayd[2597810]: enabled: true, Jul 19 18:11:58 rudder rudder-relayd[2597810]: command: "/opt/rudder/bin/rudder", Jul 19 18:11:58 rudder rudder-relayd[2597810]: use_sudo: true, Jul 19 18:11:58 rudder rudder-relayd[2597810]: } Jul 19 18:11:58 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Remote run command: 'Command { Jul 19 18:11:58 rudder rudder-relayd[2597810]: std: "sudo" "/opt/rudder/bin/rudder" "remote" "run" "--" "prod-www-02.lab.rudder.io", Jul 19 18:11:58 rudder rudder-relayd[2597810]: kill_on_drop: false, Jul 19 18:11:58 rudder rudder-relayd[2597810]: }' Jul 19 18:11:58 rudder rudder-relayd[2597810]: TRACE remote-run: mio::poll: registering event source with poller: token=Token(1526726659), interests=READABLE | WRITABLE Jul 19 18:11:58 rudder rudder-relayd[2597810]: TRACE hyper::proto::h1::conn: flushed({role=server}): State { reading: KeepAlive, writing: Init, keep_alive: Busy } ... ... Jul 19 18:12:19 rudder rudder-relayd[2597810]: TRACE remote-run: mio::poll: deregistering event source from poller Jul 19 18:12:19 rudder rudder-relayd[2597810]: INFO relayd::api: 127.0.0.1:52108 "POST /rudder/relay-api/1/remote-run/nodes/6c199162-c23b-47f7-a27b-e0ef93994a09 HTTP/1.1" 200 "-" "rudder/remote run query for node 6c19>
- when it does not work, we have:
Jul 19 18:12:29 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Starting remote run (asynchronous: false, keep_output: true) Jul 19 18:12:29 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Next-hops: [] Jul 19 18:12:29 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Neighbors: [ Jul 19 18:12:29 rudder rudder-relayd[2597810]: "prod-www-02.lab.rudder.io", Jul 19 18:12:29 rudder rudder-relayd[2597810]: ] Jul 19 18:12:29 rudder rudder-relayd[2597810]: TRACE remote-run: rudder_relayd::api::remote_run: Starting local remote run on [ Jul 19 18:12:29 rudder rudder-relayd[2597810]: "prod-www-02.lab.rudder.io", Jul 19 18:12:29 rudder rudder-relayd[2597810]: ] with RemoteRun { Jul 19 18:12:29 rudder rudder-relayd[2597810]: enabled: true, Jul 19 18:12:29 rudder rudder-relayd[2597810]: command: "/opt/rudder/bin/rudder", Jul 19 18:12:29 rudder rudder-relayd[2597810]: use_sudo: true, Jul 19 18:12:29 rudder rudder-relayd[2597810]: } Jul 19 18:12:29 rudder rudder-relayd[2597810]: DEBUG remote-run: rudder_relayd::api::remote_run: Remote run command: 'Command { Jul 19 18:12:29 rudder rudder-relayd[2597810]: std: "sudo" "/opt/rudder/bin/rudder" "remote" "run" "--" "prod-www-02.lab.rudder.io", Jul 19 18:12:29 rudder rudder-relayd[2597810]: kill_on_drop: false, Jul 19 18:12:29 rudder rudder-relayd[2597810]: }' Jul 19 18:12:29 rudder rudder-relayd[2597810]: TRACE remote-run: mio::poll: registering event source with poller: token=Token(1543503875), interests=READABLE | WRITABLE Jul 19 18:12:29 rudder rudder-relayd[2597810]: TRACE hyper::proto::h1::conn: flushed({role=server}): State { reading: KeepAlive, writing: Init, keep_alive: Busy } [nothing more => connection never ending]
So something is not correctly happening with the end of the second connection.
Updated by Alexis Mousset over 2 years ago
My findings so far:
- The problem is easily reproducible by running several remote runs on the same node from the webapp
- It is way harder to reproducde with API calls directly to apache
curl -v -X POST -k "https://127.0.0.1/rudder/relay-api/remote-run/nodes/6c199162-c23b-47f7-a27b-e0ef93994a09" --data keep_output=true --data asynchronous=false
but is seems to be possible (but I'm not sure if it could be due to leftovers from webapp connections).
- I see a lot of CLOSE_WAIT connection between the webapp and httpd (looks a bit like https://stackoverflow.com/questions/55271192/connections-leaking-with-state-close-wait-with-httpclient). Could be related.
Updated by Alexis Mousset over 2 years ago
Simpler reproduction:
- start a remote run in the interface
- run (to call relayd api directly)
curl -v -X POST -k "http://127.0.0.1:3030/rudder/relay-api/1/remote-run/nodes/6c199162-c23b-47f7-a27b-e0ef93994a09" --data keep_output=true --data asynchronous=false
- the webapp call fails with 502
- Note: other parallel calls to relayd run normally
Updated by Alexis Mousset over 2 years ago
relayd does not answer to the webapp when there are parallel calls:
# parallel calls from curl juil. 20 16:58:35 rudder sudo[956787]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder juil. 20 16:58:37 rudder sudo[957065]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder juil. 20 16:59:21 rudder rudder-relayd[951486]: INFO relayd::api: 127.0.0.1:45730 "POST /rudder/relay-api/1/remote-run/nodes/root HTTP/1.1" 200 "-" "curl/7.61.1" 45.446005693s juil. 20 16:59:22 rudder rudder-relayd[951486]: INFO relayd::api: 127.0.0.1:45734 "POST /rudder/relay-api/1/remote-run/nodes/root HTTP/1.1" 200 "-" "curl/7.61.1" 45.192579828s # parallel calls from webapp and curl juil. 20 16:59:30 rudder sudo[958023]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder juil. 20 16:59:32 rudder sudo[958319]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder juil. 20 17:00:16 rudder rudder-relayd[951486]: INFO relayd::api: 127.0.0.1:45782 "POST /rudder/relay-api/1/remote-run/nodes/root HTTP/1.1" 200 "-" "curl/7.61.1" 44.336092093s # isolated call from webapp juil. 20 17:01:49 rudder sudo[960519]: rudder-relayd : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/rudder/bin/rudder remote run -- rudder juil. 20 17:02:16 rudder rudder-relayd[951486]: INFO relayd::api: 127.0.0.1:45850 "POST /rudder/relay-api/1/remote-run/nodes/root HTTP/1.1" 200 "-" "rudder/remote run query for node root" 27.598727273s
Updated by Alexis Mousset over 2 years ago
- Status changed from New to In progress
Updated by Alexis Mousset over 2 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Alexis Mousset to François ARMAND
- Pull Request set to https://github.com/Normation/rudder/pull/4379
Updated by Alexis Mousset over 2 years ago
Ok, so:
- The core of the problem is remote-runs taking more than 60 seconds, hitting httpd default timeout and causing 502 error
- It happens when running remote-runs in parallel on the same node (especially on our demo platform where nodes have 1 core)
- Setting apache timeout to a higher value prevents the 502 error on calls to remote-run through apache, but the webapp has timeouts too
[2022-07-21 02:08:11+0200] ERROR nodes - Error occured when contacting internal remote-run API to apply classes on Node 'root': SystemError: An error occured; cause was: java.net.SocketTimeoutException: Read timed out
- Setting a higher timeout value in the http client removes the java timeout exception, but it does not seem enough to see the result in the interface.
Updated by Alexis Mousset over 2 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|f307afaca433b83db8a0619ae2acd1cdd2b15090.
Updated by Alexis Mousset over 2 years ago
- Target version changed from 7.1.4 to 7.1.3
Updated by Alexis Mousset over 2 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 7.1.3 which was released today.
Updated by Nicolas CHARLES over 1 year ago
- Priority changed from 40 to 54
- Regression set to No
This should be done in /opt/rudder/etc/rudder-apache-webapp-ssl.conf as well