Question #8176
closedAll nodes compliance report unexpected/missing except root server.
Description
All the nodes we have connected are reporting as 50%missing/50%unexpected for all compliance reports.
Example:
The CFengine binaries in /var/rudder/cfengine-community/bin are up to date
Unexpected
Files
Updated by Vincent MEMBRÉ almost 9 years ago
Hello Siemen, thank for reporting your issue!
Is there a time delay on the node and the root server ?
Can you show a screen of the entries in the techical log tab ?
Is the reporting on the root server ok ?
Updated by siemen Meijssen almost 9 years ago
Thanks for the quick reply,
What exactly do you mean with time delay? If you mean network wise, Ping times are less then 1msec, Time on all servers is configured correctly.
How would i be able to get this screen?
The clients report to the root server without a problem.( the last seen is updated every 5 minutes)
Updated by Vincent MEMBRÉ almost 9 years ago
Sorry about the late reply, I missed your answer.
For reporting to be ok, the date on the node and the server needs to be synchronized on both server and nodes ( run 'date' command on both, if you see a delay you have a problem)
About the "technical logs" tab, on a Node detail, click on the tab "technical log", one of the rightmost tabs and take a screenshot of the table displayed
Was it working before, or is it a new install ?
Updated by siemen Meijssen over 8 years ago
- File Rudder.PNG Rudder.PNG added
There indeed where some problems with the date settings. These have been corrected but it is still not working(After 40 minutes with reporting every 5 mins)
See the file attached.
This is an entirely new install on Debian.
After i set rudder agent reset. it will report correctly once. afterwards it reports as unexpected/missing again.
My apologies for the late reply
Updated by siemen Meijssen over 8 years ago
- File rudder2.PNG rudder2.PNG added
I noticed that the server which is not running correctly display the following error when manually running.(see attached)
I also noticed that the other server keeps repairs the same error but is running successfully otherwise and is now reporting like it should(for at least 25 mins)
Updated by Vincent MEMBRÉ over 8 years ago
Thank for your screens!
So from your two screen, i can see the the node could not update it's policies so is still using an old reporting format.
Two questions:
- when running 'rudder agent update' on the faulty node, do you get an error ?
If there is an error we have a tool on Rudder server: run 'rudder server debug <ip-faulty-node>' then run 'rudder agent run' on the node. can you share the output ?
A common update error is that Rudder serveur is not resolving correctly the node hostname, it may be the case here (to check run 'getent hostname-of-your-node' on the server )
Updated by siemen Meijssen over 8 years ago
I get the error:
error: Method 'update_action failed in some repairs
see attached
When i run getent i get the error:
Unknown database: <name of node>
I have also noticed that the servers switch around. whenever 1 is working the other one isn't
Updated by Vincent MEMBRÉ over 8 years ago
oops, i tolds you wrong commands, sorry! :(
it's 'rudder agent update' you need to run after running rudder-server-debug and not rudder agent run!
and getent command is "getent hosts <hostname-of-your-node>"
Updated by Vincent MEMBRÉ over 8 years ago
from logs, i can see that your node is identified as debian-test, is that correct ? or should it be the other one ?
Updated by siemen Meijssen over 8 years ago
Now it is my time to apologize. I uploaded the wrong log file. ill update you ASAP
Updated by siemen Meijssen over 8 years ago
The getent command returns no output.
The server which is not reporting is Stream-Server
The Debian-Test server Also didnt report for some time but after running an apt-get dist-upgrade and rudder agent reset it is working now.
Updated by siemen Meijssen over 8 years ago
I did another reset/reinit on the Stream-Server.
It is reporting again for at least 20 mins now. Ill let you know if it stays that way.
Updated by Vincent MEMBRÉ over 8 years ago
Your server cannot determine that your stream-server ip is your stream-server, you need to htlp him finds out
easiest way is to define the line in the /etc/hosts of your server rudder about stream-server
Updated by siemen Meijssen over 8 years ago
That is weird. Because it is reporting correctly now.
If this was the issue it would fail all the time right?
Why would the rudder master need to have the name of the client hosts? I thought this was all done over IP.
Updated by siemen Meijssen over 8 years ago
I think this issue has been resolved. I have no clue what caused it to report that way but apperently it is fixed in the latest release.
Updated by Vincent MEMBRÉ over 8 years ago
- Tracker changed from Bug to Question
- Status changed from New to Resolved
Great that it's working now ... but you're right it's weird that you had to do all those things to make things right.
We rely on name resolution and rudder needs to know each node hostname to authorize it correctly. (we are thinking about changing that behavior but we are not there yet ... )
You can disable this dns lookups by unticking: 'Use reverse DNS lookups on nodes to reinforce authentication to policy server' in Administration/Settings page then Rudder will authenticate using their IP only ( so any node with the same ip will have access to its promise and can be a security issue)
If you still got problem in the future, feel free to reopen this issue.
Thanks :)
Updated by François ARMAND over 8 years ago
I'm wondering if it can't be linked to #8051, to ? The symptoms are quite alike.
Updated by François ARMAND over 8 years ago
- Related to Bug #8051: Compliance is not correctly computed if we receive run agent right after generation added
Updated by siemen Meijssen over 8 years ago
I did indeed ran that command so it might be related.
Updated by François ARMAND over 8 years ago
- Related to Bug #7336: Node stuck in "Applying" status added