Project

General

Profile

Actions

Bug #16241

closed

Improve reliability and error reporting of report upload in agent

Added by Florian Heigl about 5 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
N/A
Assignee:
-
Category:
Agent
Target version:
Severity:
Minor - inconvenience | misleading | easy workaround
UX impact:
User visibility:
Operational - other Techniques | Rudder settings | Plugins
Effort required:
Priority:
27
Name check:
To do
Fix check:
To do
Regression:

Description

I suppose you are aware but ... the agent modifications for https protocol seem unfinished.

E| n/a           sshKeyDistribution        Flush SSH file            tze                The keys for user tze were not requested to be flush
E| n/a           sshKeyDistribution        Flush SSH file            monitoring key     The keys for user svcmon were not requested to be flush
E| compliant     sudoParameters            sudoersFile                                  The sudoers file did not require any modification
E| compliant     sudoParameters            sudoersFile                                  The sudoers file did not require any modification
E| n/a           Common                    Monitoring                                   No Rudder monitoring information to share with the server

## Summary #####################################################################
83 components verified in 16 directives
   => 83 components in Enforce mode
      -> 63 compliant
      -> 2 repaired
      -> 17 not-applicable
      -> 1 error
Execution time: 11.48s
################################################################################
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>201 Created</title>
</head><body>
<h1>Created</h1>
<p>Resource /reports/2019-11-21T17%3A34%3A22%2B00%3A00%40b82edcfc-6fdf-41ab-9c7f-679139cec7b4.log.gz has been created.</p>
<hr />
<address>Apache/2.4.25 (Debian) Server at 172.16.52.143 Port 443</address>
</body></html>

The posting of the reports should be wrapped in the agent output, currently it leaks for both "rudder agent run" and "rudder agent inventory"

It can be confusing because at this point the agent already completed that it finished the run. So it's done, but then it's doing another thing, that can also fail even though the agent just completed ok.
It uses a different format (html ;-) than the agent output.

This output should definitely appear no later than "Summary" and the upload should also be included in the execution time.
(I would agree that a run can be successful even if the reporting fails)

Please could you make some tests where the http dest is

  • dropping the connect
  • writes at 10 bytes/s
  • has disk full
  • is not writeable (permissions)

to check that there is timeout handling on the upload process and failed uploads ARE being caught)
The severity of this issue is best set based on the behaviour when the upload has problems.


Related issues 1 (0 open1 closed)

Related to Rudder - Bug #16112: HTTP report PUT prints useless messages at the end of the runReleasedBenoît PECCATTEActions
Actions #1

Updated by Alexis Mousset about 5 years ago

  • Related to Bug #16112: HTTP report PUT prints useless messages at the end of the run added
Actions #2

Updated by Alexis Mousset about 5 years ago

  • Subject changed from Agent shows raw http results to Improve reliability and error reporting of report upload in agent

Thanks a lot for this feedback!

The HTTP result problem has already been fixed in #16112, I'm renaming this issue to reflect the rest of the comments.

Actions #3

Updated by Alexis Mousset about 5 years ago

Florian Heigl wrote:

The posting of the reports should be wrapped in the agent output, currently it leaks for both "rudder agent run" and "rudder agent inventory"

It can be confusing because at this point the agent already completed that it finished the run. So it's done, but then it's doing another thing, that can also fail even though the agent just completed ok.
It uses a different format (html ;-) than the agent output.

This output should definitely appear no later than "Summary" and the upload should also be included in the execution time.
(I would agree that a run can be successful even if the reporting fails)

I agree that ideally the report status should appear before the end of the run. The problem is that it is currently hard to do due to the architecture of the wrapping scripts, and because we want the agent to stop execution before actually signing and sending the report (because we pipe its output into the report file we send). We'll try to find a better solution to put the report sending status into the summary, but it may not be before 6.0.0 is released.

In the meantime I'll reformat the output to make the put status look more like a part of the summary (#16245).

Please could you make some tests where the http dest is

  • dropping the connect
  • writes at 10 bytes/s
  • has disk full
  • is not writeable (permissions)

to check that there is timeout handling on the upload process and failed uploads ARE being caught)
The severity of this issue is best set based on the behaviour when the upload has problems.

When the PUT fails, the report is kept on the node, and the agent has a policy (in system techniques) to send reports that were not properly sent at the end of previous runs. This allow backfilling compliance reports in case of network problems. I opened #16242 to handle the purge of old reports and avoid filling local disks when server is not available.

We will run more tests on different failure modes (particularly tiemout and related, we are using curl defaults here, but we may want to customize them).

Actions #4

Updated by Alexis Mousset about 5 years ago

  • Description updated (diff)
Actions #5

Updated by Alexis Mousset almost 5 years ago

  • Target version changed from 6.0.0 to 6.0.1
Actions #6

Updated by Vincent MEMBRÉ almost 5 years ago

  • Target version changed from 6.0.1 to 6.0.2
Actions #7

Updated by Vincent MEMBRÉ almost 5 years ago

  • Target version changed from 6.0.2 to 6.0.3
Actions #8

Updated by Vincent MEMBRÉ almost 5 years ago

  • Target version changed from 6.0.3 to 6.0.4
Actions #9

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 6.0.4 to 6.0.5
Actions #10

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 6.0.5 to 6.0.6
Actions #11

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 6.0.6 to 6.0.7
Actions #12

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 6.0.7 to 6.0.8
Actions #13

Updated by François ARMAND over 4 years ago

  • Severity set to Minor - inconvenience | misleading | easy workaround
  • User visibility set to Operational - other Techniques | Rudder settings | Plugins
  • Priority changed from 0 to 29
Actions #14

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 6.0.8 to 6.0.9
Actions #15

Updated by Vincent MEMBRÉ about 4 years ago

  • Target version changed from 6.0.9 to 6.0.10
  • Priority changed from 29 to 28
Actions #16

Updated by Vincent MEMBRÉ about 4 years ago

  • Target version changed from 6.0.10 to 798
  • Priority changed from 28 to 27
Actions #17

Updated by Benoît PECCATTE over 3 years ago

  • Target version changed from 798 to 6.1.14
Actions #18

Updated by Vincent MEMBRÉ over 3 years ago

  • Target version changed from 6.1.14 to 6.1.15
Actions #19

Updated by Vincent MEMBRÉ over 3 years ago

  • Target version changed from 6.1.15 to 6.1.16
Actions #20

Updated by Vincent MEMBRÉ about 3 years ago

  • Target version changed from 6.1.16 to 6.1.17
Actions #21

Updated by Vincent MEMBRÉ about 3 years ago

  • Target version changed from 6.1.17 to 6.1.18
Actions #22

Updated by Vincent MEMBRÉ almost 3 years ago

  • Target version changed from 6.1.18 to 6.1.19
Actions #23

Updated by Alexis Mousset almost 3 years ago

  • Status changed from New to Resolved

Closing, the display problem has been fixed and reporting reliability seems acceptable.

Actions

Also available in: Atom PDF