User story #5293
closedAdd a 'changes only' compliance mode, only reporting changes on systems
Description
Today, Rudder only support one reporting mode which is a compliance assessing mode.
That means that at any time, Rudder knows what execution report each node should have send, compare it to what the node actually sent, and based on that display what is ok and what is wrong.
To be more precise, when a node run the agent, it execute compliance assessment on a list of rules, composed of several directives, composed of several components, composer of several values, possibly avaiting several message (execution reports).
Compliance assessment on each component produces reports among success (yes !), error ("the component is not in the correct state and I can't change that"), "repaired" (the componenent was not on the correct state and I corrected it to the correct one), and other status failing in either of this 3 kinds or not relevant for compliance (log/info messages).
In compliance checking mode (the current, only mode available), Rudder will look if the node exactly sent the correct number of messages throught execution reports, even if they are ALL success, because getting 0 or more than two success when exactly one isexpected is an error, and getting "success" for unknow components is clearly spurious and need investigation.
That mode is great but generate a lot of traffic on the wire. So sometime, we could afford, even prefer, a mode where only changesù ("error" and "repaired" status) are sent back to the server. In that mode, we assume that the absence of error is a success, and so that account for the "compliance disabled" semantic. In that mode, we will not be able to detect compliance drift like on the compliance one above, but on the other hand it will be far less chatty.
From a more technical point of view, even in changes-only mode, we will keep the system logs "agent run start / agent run end" that a node send to indicate that the agent runs during that period if other changes have to be notified to the server. We will consider that any reports not sent by the node in that interval are actually success reports.
Moreover, some "agent run start / agent run end" may not be send to the server because no changes happened in that run. In that case, periocically, even if everything is ok, a heartbeat message can be send (surrounded by run start/run end), so that the server knows that the node is alive and actually check things.
Updated by François ARMAND over 10 years ago
- Status changed from 13 to Discussion
- Assignee changed from François ARMAND to Jonathan CLARKE
I'm wondering if we should not have a 3 states mode:
- 1/ "compliance", corresponding to the actual mode: every reports is sent by the node for it's run, including START_RUN, SUCCESS, END_RUN. In that mode, we are able to check for nodes not responding and for the full execution of all rules components compared to the expected configuration (i.e, we are able to detect if actually 3 users where checked for, and not 2 or 4)
- 2/ "error only", where SUCCESS are not sent, but START_RUN/END_RUN are, and of course ERROR/REPAIRED are to. In that mode, we are also able to check if a node is not respondind, because even in full SUCCESS runs, we should have a START_RUN/END_RUN at the frequency of the agent. In that mode, we are not able to check if exactly all rules components are OK (missing results equals SUCCESS, and outnumbered result in SUCCESS won't be reported). This could be a good compromise for normal criticity infra, where full compliance is too ressources demanding.
- 3/ "silent" where ONLY ERROR/REPAIRED, and optionnally a HEARTBEAT reports are send. In that mode, when the run is all SUCCESS, NOTHING goes back to the server. So from the server, we can only update status when ERROR, REPAIRED or HEARTBEAT happen. That mode can be extremelly usefull for infrastructure where the network is limited and bandwith is a precisous ressource, or that the nodes are most of the time disconnected from networks (think embeded devices).
Updated by François ARMAND over 10 years ago
Well, thinking a little more about it, in fact the mode #2 is a subcase of the mode #3 for which the HEARTBEAT frequency is the same as the agen run freqency.
So we really have only two modes.
Updated by François ARMAND over 10 years ago
wip branches:
- cf-clerck: https://github.com/fanf/cf-clerk/tree/ust_5293/dev/5399_create_sys_var_node_config
- rudder (update logic): https://github.com/fanf/rudder/tree/ust_5293/dev_5296_add_error_only_report_mode
Updated by François ARMAND over 10 years ago
- Subject changed from Create a "error only - disable compliance" mode to Rudder to Create a "changes only (disable compliance)" mode to Rudder
- Description updated (diff)
Updated by Matthieu CERDA about 10 years ago
- Target version changed from 140 to 3.0.0~beta1
Updated by Jonathan CLARKE about 10 years ago
- Category set to Web - Config management
- Status changed from Discussion to 12
Updated by Jonathan CLARKE about 10 years ago
- Status changed from 12 to Pending release
Updated by Vincent MEMBRÉ about 10 years ago
- Subject changed from Create a "changes only (disable compliance)" mode to Rudder to Add a 'changes only' compliance mode, only reporting changes on systems
Updated by Vincent MEMBRÉ about 10 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 3.0.0~beta1 which was release on 01/12/2014.
- Announcement
- Changelog
- Download information: https://www.rudder-project.org/site/get-rudder/downloads/