Bug #10185
closed
Remote-run exec for root fail with "rudder agent was interrupted"
Added by François ARMAND almost 8 years ago.
Updated over 7 years ago.
Category:
Relay server or API
Description
The message is:
== [fanf@luhman16]
% curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx" -X POST 'https://192.168.44.2/rudder/api/latest/nodes/root/applyPolicy' -d "classes=inventory"
error Rudder agent was interrupted during execution by a fatal error
Run with -i to see log messages.
## Summary #####################################################################
0 components verified in 0 directives
execution time: 0.31s
################################################################################
== [fanf@luhman16]
% curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx" -X POST 'https://192.168.44.2/rudder/api/latest/nodes/6866e5db-bb41-4110-958b-c1f1c90dbcbe/applyPolicy'
error Rudder agent was interrupted during execution by a fatal error
Run with -i to see log messages.
## Summary #####################################################################
0 components verified in 0 directives
execution time: 0.29s
################################################################################
OK, after trying to start it some more time on the remote node, it started to work. I have 0 idea about what was wrong. And there is no log.
We need to at least add logs to be able to do some forensic when things are not working as expected.
Editing on the root server file: /opt/rudder/share/relay-api/relay_api/remote_run.py to add a "-i" to REMOTE_RUN_COMMAND && restarting apache, I'm now getting:
rudder info: ........................................................................
rudder info: Hailing server.rudder.local : 5309
rudder info: ........................................................................
error: TRUST FAILED, server presented untrusted key: MD5=3275d8e38205fada95e6236901099527
error: Failed to connect to host: server.rudder.local
error Rudder agent was interrupted during execution by a fatal error
## Summary #####################################################################
0 components verified in 0 directives
execution time: 0.30s
################################################################################
We should have an API option to allow to use that output.
And sometimes, I don't get anything at all:
== [fanf@luhman16] ==
% curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx" -X POST 'https://192.168.44.2/rudder/api/latest/nodes/c867b070-0721-43d3-8825-d78c51c2c632/applyPolicy'
- Assignee set to Benoît PECCATTE
- Assignee changed from Benoît PECCATTE to Nicolas CHARLES
- Translation missing: en.field_tag_list set to Blocking 4.1
Webapp log show following error:
java.lang.NumberFormatException: For input string: "Error when trying to contact internal remote-run API: null"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Byte.parseByte(Byte.java:149)
at java.lang.Byte.parseByte(Byte.java:175)
at scala.collection.immutable.StringLike.toByte(StringLike.scala:297)
at scala.collection.immutable.StringLike.toByte$(StringLike.scala:297)
at scala.collection.immutable.StringOps.toByte(StringOps.scala:29)
at com.normation.rudder.web.rest.node.NodeApiService8.runResponse(NodeAPIService8.scala:119)
at com.normation.rudder.web.rest.node.NodeApiService8.$anonfun$runNode$4(NodeAPIService8.scala:155)
at com.normation.rudder.web.rest.node.NodeApiService8.$anonfun$runNode$4$adapted(NodeAPIService8.scala:155)
at net.liftweb.http.LiftServlet.sendResponse(LiftServlet.scala:1040)
at net.liftweb.http.LiftServlet.doService(LiftServlet.scala:451)
at net.liftweb.http.LiftServlet.$anonfun$service$2(LiftServlet.scala:157)
at net.liftweb.util.TimeHelpers.calcTime(TimeHelpers.scala:427)
...
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:369)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:464)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:913)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:975)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:641)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:231)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Ha, this message is probably more relevant:
Feb 28 12:02:32 server cf-serverd[1583]: CFEngine(server) rudder 127.0.0.1> Connection was hung up while receiving line:
Feb 28 12:02:32 server cf-serverd[1583]: CFEngine(server) rudder 127.0.0.1> Client closed connection early! He probably does not trust our key...
Full verbose is
rudder verbose: Obtained IP address of '127.0.0.1' on socket 7 from accept
rudder verbose: New connection (from 127.0.0.1, sd 7), spawning new thread...
rudder info: 127.0.0.1> Accepting connection
rudder verbose: 127.0.0.1> Setting socket timeout to 600 seconds.
rudder verbose: 127.0.0.1> Peeked nothing important in TCP stream, considering the protocol as TLS
rudder verbose: 127.0.0.1> TLS version negotiated: TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3
rudder verbose: 127.0.0.1> TLS session established, checking trust...
rudder verbose: 127.0.0.1> Remote peer terminated TLS session (SSL_read)
error: 127.0.0.1> Connection was hung up while receiving line:
notice: 127.0.0.1> Client closed connection early! He probably does not trust our key..
Agent side:
rudder verbose: Connected to host 192.168.41.2 address 192.168.41.2 port 5309 (socket descriptor 4)
rudder verbose: TLS version negotiated: TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3
rudder verbose: TLS session established, checking trust...
rudder verbose: Did not find new key format '/var/rudder/cfengine-community/ppkeys/root-MD5=57ccba22df018012132877618ff655f9.pub'
rudder verbose: Trying old style '/var/rudder/cfengine-community/ppkeys/root-192.168.41.2.pub'
rudder verbose: Received key 'MD5=57ccba22df018012132877618ff655f9' not found in ppkeys
error: TRUST FAILED, server presented untrusted key: MD5=57ccba22df018012132877618ff655f9
rudder verbose: Connection to 192.168.41.2 is closed
error: Failed to connect to host: 192.168.41.2
Ok, so after some tests:
- we cannot remote run on itself ; cf-runagent doesn't seem to support it
- remote running a 4.1 node is ok
- remote running a 4.0 node fails, as command is not valid
root@agent1:/home/vagrant# /opt/rudder/bin/rudder agent run -uR -I -Dcfruncommand
Rudder agent 4.0.4.rc1.git201702280322 (CFEngine Core 3.7.4)
Node uuid: e04cdc24-2180-4d2e-b334-0445a13a3a45
ok: Rudder agent promises were updated.
error: Remote execution cannot ignore locks
- remote running a 3.1 node fails, as command /opt/rudder/bin/rudder agent run -uR -I -Dcfruncommand --inform is not valid
root@agent2:/home/vagrant# /opt/rudder/bin/rudder agent run -uR -Dcfruncommand --inform
/opt/rudder/share/commands/agent-run : option non permise -- u
/opt/rudder/share/commands/agent-run : option non permise -- -
/opt/rudder/share/commands/agent-run : option non permise -- n
/opt/rudder/share/commands/agent-run : option non permise -- o
Rudder agent 3.1.19.rc1.git201702210714 (CFEngine Core 3.6.5)
Node uuid: 791a6ebe-cfb1-4f54-b9a2-48ca162f64b6
2017-02-28T12:38:05+0000 error: Remote execution cannot ignore locks
So, a remote API should not try to remote run on local system, and we need a fix for 4.0 and 3.1 compatibility
- Translation missing: en.field_tag_list deleted (
Blocking 4.1)
- Category set to Relay server or API
- Assignee changed from Nicolas CHARLES to Benoît PECCATTE
I'm letting that ticket open to change Relay API and do the correct call to rudder agent. I'm opening a subticket to correct the null pointer exception on rudder side that should not happen.
- Status changed from New to In progress
- Assignee changed from Benoît PECCATTE to Alexis Mousset
- Status changed from In progress to Pending technical review
- Assignee changed from Alexis Mousset to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-packages/pull/1270
- Status changed from Pending technical review to Pending release
- Subject changed from Remote-run exec for root and nodes behind relays fail with "rudder agent was interrupted" to Remote-run exec for root fail with "rudder agent was interrupted"
- Status changed from Pending release to Released
This bug has been fixed in Rudder 4.1.0~rc1 which was released today.
Also available in: Atom
PDF