Project

General

Profile

Actions

Architecture #16237

closed

Trigger agent run - improve errors display

Added by Florian Heigl over 4 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
N/A
Category:
Web - UI & UX
Target version:
-
Effort required:
Name check:
To do
Fix check:
To do
Regression:
No

Description

I was just trying to fire off a client policy rtun from the Node details -> Compliance Reports -> Trigger agent run button.
Click the button, it grays out and shows Loading (Spinning Circle).
That makes sense.
But It seems to not work (idk)
And it stays like that... forever?
The normal run time of that agent is like 1-2 seconds, it's displaying the same thing since a few minutes now.

After like 5 minutes it aborted.

@502 - Proxy Error
Proxy Error
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request

Reason: Error reading from remote server

Apache/2.4.25 (Debian) Server at 172.16.52.143 Port 443@

It would be good to see a "still trying, retrying etc." status. I was thinking it's in a dead code path.
Also the proxy error isn't so helpful, i don't know if there was one attempt with 300s timeout or 5 attempts, i don't know if an initial handshake worked etc.

If I imagine myself as a user with only GUI permissions (lets say they can set properties and launch a run) this gives them very little info.

If possible, document the phases of this process and build the message based on it.
"Failed at step 1 - connecting to agent for trigger. See <doc url>"
and <doc url> says:
"if you receive reports from the agent, it's a firewall issue
if you receive no reports from the agent, the agent might be down"

Footnote 1: at that point we see that, if we document this, rudder can diagnose the problem by it's own

Footnote 2: I checked the webapp logfile, nothing there, most likely it would be in some apache log. But my general understanding is "something didn't work", and for an UI-exposed feature I would like the UI to tell me what happened, not present me with an HTTP timeout.

Basically Rudder says "i tried something and here's proof it didn't work".

Footnote 3: I did a find / name parallel and didn't find it. If you should still rely on GNU parallel, you still miss a dependency to it. You need to have dependencies for the things you use ;)

Footnote 4: IMO this is a blocker, you can't release with GUI features that just bring up an HTTP error.


Related issues 1 (0 open1 closed)

Related to Rudder - Bug #16222: Exception "fiberFailed" when running agent from UIReleasedVincent MEMBRÉActions
Actions #1

Updated by François ARMAND over 4 years ago

  • Related to Bug #16222: Exception "fiberFailed" when running agent from UI added
Actions #2

Updated by François ARMAND over 4 years ago

  • Assignee set to Elaad FURREEDAN

Hello, thanks for reporting. We have a pending bug on run agent, prehaps it's the same (#16222). Even if not having a trace in webapp log may tells it's different.

But you are right on everything:

- it's a blocker for release - we won't release a new broken feature,
- we should give better error message (if the problem is the same than in #16222, the missing error reporting may be b/c of the exception which is not correctly handled)
- yes, we need to install/require dependencies that we use.

Elaad, would you please test/reproduce that one when you will be working in #16222 ?

Actions #3

Updated by François ARMAND about 4 years ago

  • Severity changed from Major - prevents use of part of Rudder | no simple workaround to Minor - inconvenience | misleading | easy workaround
  • Priority changed from 52 to 29

I think there's several thing in that bug report.

The proxy error etc is a bug, and I thing we corrected several such ones (some needed to wait for 6.1). Please open other tickets if you still see them.

I will keep that ticket for the part where it would be helpful to have more interactive information about what is don't currently (like: "connecting to node", "running agent", "sending reports", etc). For that, I think the level is "misleading".

Actions #4

Updated by François ARMAND about 4 years ago

  • User visibility changed from Operational - other Techniques | Rudder settings | Plugins to Getting started - demo | first install | Technique editor and level 1 Techniques
  • Priority changed from 29 to 45

(but it's typically one of the first thing you will try, so "getting started" visibility)

Actions #5

Updated by Alexis Mousset over 2 years ago

  • Tracker changed from Bug to Architecture
  • Subject changed from Trigger agent run - timeout? to Trigger agent run - improve errors display
  • Severity deleted (Minor - inconvenience | misleading | easy workaround)
  • User visibility deleted (Getting started - demo | first install | Technique editor and level 1 Techniques)
  • Priority deleted (45)
Actions #6

Updated by François ARMAND 3 months ago

  • Status changed from New to Resolved
  • Regression set to No

This was greatly improved, several time. Closing.

Actions

Also available in: Atom PDF