Project

General

Profile

Actions

User story #4234

closed

Add online|offline check before calculating status

Added by Dennis Cabooter about 11 years ago. Updated 4 months ago.

Status:
Resolved
Priority:
N/A
Assignee:
-
Category:
Web - Compliance & node report
UX impact:
Suggestion strength:
User visibility:
Effort required:
Name check:
Fix check:
Regression:
No

Description

Would it be an idea to check if a node is online, before calculating status? When managing desktops, desktops will be offline sometimes, so i have no way to see if a nodes offline or if rudder-agent disfunctions. I'm going to manage 20+ desktops with Rudder and they will be offline several times. I can assume a machine with an no answer state is offline, but it could also be that rudder-agent is broken on that node.

Some thoughts on IRC:

16:30 < jooooooon> dnns: it's not always possible to check if a node is 
                   offline, because of firewalls rules, network topology etc
16:30 < jooooooon> :/
16:34 < dnns> jooooooon: it's maybe not always possible to see if a node's 
              online by pinging. but maybe the node could send a message with 
              curl to the rudder server to say hi' i'm online
16:34 < jooooooon> that's kinda the logic we already apply with reports tho, no?
16:35 < dnns> jooooooon: how can i see the difference between a node which is 
              offline and a node with a disfunctional rudder-agent/rsyslog?
16:36 < jooooooon> dnns: ahhh, I see what you mean
16:37 < jooooooon> any ideas on how to display that differently?
16:38 < dnns> jooooooon: Succes | Repaired | Error | No Answer | Offline
16:38 < dnns> ?
16:38 < jooooooon> I like it :)
16:39 < jooooooon> I just worry that we can't really *know* a node is offline
16:39 < jooooooon> but I suppose a node that doesn't contact the Rudder server 
                   is pretty much offline
16:39 < jooooooon> maybe "No answer" should be renamed too?
16:40 < ncharles> maybe we could have some kind of snmp probe ?
16:40 < Kegeruneku> Like uh, a ping probe ?
16:41 < Kegeruneku> instead
16:41 < dnns> well, no anwer can also mean that the node is up but doesn't send 
              out logs
16:44 < jooooooon> but we can't really differentiate between that scenario and 
                   "offline" 
16:45 < Kegeruneku> Well, Off line = Off the line = No connection between two 
                    peers
16:45 < Kegeruneku> It's not really wrong :

Related issues 1 (1 open0 closed)

Related to Rudder - Architecture #24963: Persist compliance in base to know last state for a long timePending releaseFrançois ARMANDActions
Actions #1

Updated by Erwin Vrolijk about 11 years ago

There is really no way to differentiate a non functioning node from an offline node if the pinging (curl or whatever, from node to server) is done from the main rudder agent.
This can be sidestepped by relying for the pinging on a different process, like cron. This is a bit ugly, but cron is already a requirement for the rudder agent.

My proposal would be to use the bundled curl to regurarly send a ping to the rudder server via HTTP post. This process must not have any dependencies on the rudder agent, cfengine or rsyslog and must be controlled via cron. Thes cron entry is added during the installation of the rudder agent.
The HTTP POST could simply only contain the nodes rudderid and a NOOP or Keepalive message.

A nodes status can become offline when no ping is received for 2x the configured ping time in cron.
The HTTP post messages can be turned into a technical log by the rudder server and appended to the nodes log. This allows for debugging of the ping itself, for instance when the rudder agent is working fine but the pinging is not.

Actions #2

Updated by Vincent MEMBRÉ about 11 years ago

  • Status changed from New to Discussion
  • Target version set to Ideas (not version specific)

Thanks to both of you about your ideas and proposal.

It would be definitely a good thing to be able to determine whether a node is shutdown or if it has issue sending reports.

And I like your idea, Erwin, of sending a "ping" from each agent that would transform into a report from the node. (with a dedicated API on the server)

However this is very tricky, and if the node cannot send reports, maybe the node will not be able to send that signal, leading to false "offline" instead of "no anwser".

I have no ideas of what would be the best solution here, and what should be done.

Everyone, what do you think about that feature? do you have any problem with it, do you have any more ideas to add ?

Actions #3

Updated by Jonathan CLARKE about 11 years ago

  • Status changed from Discussion to New
  • Target version deleted (Ideas (not version specific))

I like the idea. Sure, Vincent, you're right that if network conditions are adverse, the "ping by curl" won't work anymore than sending reports, BUT there are many cases where syslog reports and/or rudder-agent can fail to send, but a simple HTTP ping could get through. This wouldn't be foolproof, but could be nice to have.

Actions #4

Updated by Olivier Mauras about 11 years ago

Please make it an option and not a requirement :)

Actions #5

Updated by Benoît PECCATTE over 9 years ago

  • Category set to Web - Compliance & node report
  • Target version set to Ideas (not version specific)
Actions #6

Updated by François ARMAND 7 months ago

  • Related to Architecture #24963: Persist compliance in base to know last state for a long time added
Actions #7

Updated by François ARMAND 4 months ago

  • Status changed from New to Resolved
  • Regression set to No

We can't really check is the node is online, and we want to avoid direct node call from relay.
So we added the possibility for the user to control "how long" a node last compliance should be kept before the node should be considered not available.
With #24963, people can ask for rudder to report a problem (grey node) after 10 minutes of no reports for a server, but says it's OK to wait 4 days for a laptop.

I'm closing this one, since we won't do more on the subject without new expressed needs.

Actions

Also available in: Atom PDF