Bug #16241
closedImprove reliability and error reporting of report upload in agent
Description
I suppose you are aware but ... the agent modifications for https protocol seem unfinished.
E| n/a sshKeyDistribution Flush SSH file tze The keys for user tze were not requested to be flush E| n/a sshKeyDistribution Flush SSH file monitoring key The keys for user svcmon were not requested to be flush E| compliant sudoParameters sudoersFile The sudoers file did not require any modification E| compliant sudoParameters sudoersFile The sudoers file did not require any modification E| n/a Common Monitoring No Rudder monitoring information to share with the server ## Summary ##################################################################### 83 components verified in 16 directives => 83 components in Enforce mode -> 63 compliant -> 2 repaired -> 17 not-applicable -> 1 error Execution time: 11.48s ################################################################################ <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>201 Created</title> </head><body> <h1>Created</h1> <p>Resource /reports/2019-11-21T17%3A34%3A22%2B00%3A00%40b82edcfc-6fdf-41ab-9c7f-679139cec7b4.log.gz has been created.</p> <hr /> <address>Apache/2.4.25 (Debian) Server at 172.16.52.143 Port 443</address> </body></html>
The posting of the reports should be wrapped in the agent output, currently it leaks for both "rudder agent run" and "rudder agent inventory"
It can be confusing because at this point the agent already completed that it finished the run. So it's done, but then it's doing another thing, that can also fail even though the agent just completed ok.
It uses a different format (html ;-) than the agent output.
This output should definitely appear no later than "Summary" and the upload should also be included in the execution time.
(I would agree that a run can be successful even if the reporting fails)
Please could you make some tests where the http dest is
- dropping the connect
- writes at 10 bytes/s
- has disk full
- is not writeable (permissions)
to check that there is timeout handling on the upload process and failed uploads ARE being caught)
The severity of this issue is best set based on the behaviour when the upload has problems.
Updated by Alexis Mousset about 5 years ago
- Related to Bug #16112: HTTP report PUT prints useless messages at the end of the run added
Updated by Alexis Mousset about 5 years ago
- Subject changed from Agent shows raw http results to Improve reliability and error reporting of report upload in agent
Thanks a lot for this feedback!
The HTTP result problem has already been fixed in #16112, I'm renaming this issue to reflect the rest of the comments.
Updated by Alexis Mousset about 5 years ago
Florian Heigl wrote:
The posting of the reports should be wrapped in the agent output, currently it leaks for both "rudder agent run" and "rudder agent inventory"
It can be confusing because at this point the agent already completed that it finished the run. So it's done, but then it's doing another thing, that can also fail even though the agent just completed ok.
It uses a different format (html ;-) than the agent output.This output should definitely appear no later than "Summary" and the upload should also be included in the execution time.
(I would agree that a run can be successful even if the reporting fails)
I agree that ideally the report status should appear before the end of the run. The problem is that it is currently hard to do due to the architecture of the wrapping scripts, and because we want the agent to stop execution before actually signing and sending the report (because we pipe its output into the report file we send). We'll try to find a better solution to put the report sending status into the summary, but it may not be before 6.0.0 is released.
In the meantime I'll reformat the output to make the put status look more like a part of the summary (#16245).
Please could you make some tests where the http dest is
- dropping the connect
- writes at 10 bytes/s
- has disk full
- is not writeable (permissions)
to check that there is timeout handling on the upload process and failed uploads ARE being caught)
The severity of this issue is best set based on the behaviour when the upload has problems.
When the PUT fails, the report is kept on the node, and the agent has a policy (in system techniques) to send reports that were not properly sent at the end of previous runs. This allow backfilling compliance reports in case of network problems. I opened #16242 to handle the purge of old reports and avoid filling local disks when server is not available.
We will run more tests on different failure modes (particularly tiemout and related, we are using curl defaults here, but we may want to customize them).
Updated by Alexis Mousset about 5 years ago
- Target version changed from 6.0.0 to 6.0.1
Updated by Vincent MEMBRÉ about 5 years ago
- Target version changed from 6.0.1 to 6.0.2
Updated by Vincent MEMBRÉ almost 5 years ago
- Target version changed from 6.0.2 to 6.0.3
Updated by Vincent MEMBRÉ almost 5 years ago
- Target version changed from 6.0.3 to 6.0.4
Updated by Vincent MEMBRÉ almost 5 years ago
- Target version changed from 6.0.4 to 6.0.5
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 6.0.5 to 6.0.6
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 6.0.6 to 6.0.7
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 6.0.7 to 6.0.8
Updated by François ARMAND over 4 years ago
- Severity set to Minor - inconvenience | misleading | easy workaround
- User visibility set to Operational - other Techniques | Rudder settings | Plugins
- Priority changed from 0 to 29
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 6.0.8 to 6.0.9
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 6.0.9 to 6.0.10
- Priority changed from 29 to 28
Updated by Vincent MEMBRÉ about 4 years ago
- Target version changed from 6.0.10 to 798
- Priority changed from 28 to 27
Updated by Benoît PECCATTE over 3 years ago
- Target version changed from 798 to 6.1.14
Updated by Vincent MEMBRÉ over 3 years ago
- Target version changed from 6.1.14 to 6.1.15
Updated by Vincent MEMBRÉ over 3 years ago
- Target version changed from 6.1.15 to 6.1.16
Updated by Vincent MEMBRÉ over 3 years ago
- Target version changed from 6.1.16 to 6.1.17
Updated by Vincent MEMBRÉ about 3 years ago
- Target version changed from 6.1.17 to 6.1.18
Updated by Vincent MEMBRÉ about 3 years ago
- Target version changed from 6.1.18 to 6.1.19
Updated by Alexis Mousset almost 3 years ago
- Status changed from New to Resolved
Closing, the display problem has been fixed and reporting reliability seems acceptable.