Bug #10185
closedRemote-run exec for root fail with "rudder agent was interrupted"
Description
The message is:
== [fanf@luhman16] % curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx" -X POST 'https://192.168.44.2/rudder/api/latest/nodes/root/applyPolicy' -d "classes=inventory" error Rudder agent was interrupted during execution by a fatal error Run with -i to see log messages. ## Summary ##################################################################### 0 components verified in 0 directives execution time: 0.31s ################################################################################ == [fanf@luhman16] % curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx" -X POST 'https://192.168.44.2/rudder/api/latest/nodes/6866e5db-bb41-4110-958b-c1f1c90dbcbe/applyPolicy' error Rudder agent was interrupted during execution by a fatal error Run with -i to see log messages. ## Summary ##################################################################### 0 components verified in 0 directives execution time: 0.29s ################################################################################
OK, after trying to start it some more time on the remote node, it started to work. I have 0 idea about what was wrong. And there is no log.
We need to at least add logs to be able to do some forensic when things are not working as expected.
Updated by François ARMAND almost 8 years ago
Editing on the root server file: /opt/rudder/share/relay-api/relay_api/remote_run.py to add a "-i" to REMOTE_RUN_COMMAND && restarting apache, I'm now getting:
rudder info: ........................................................................ rudder info: Hailing server.rudder.local : 5309 rudder info: ........................................................................ error: TRUST FAILED, server presented untrusted key: MD5=3275d8e38205fada95e6236901099527 error: Failed to connect to host: server.rudder.local error Rudder agent was interrupted during execution by a fatal error ## Summary ##################################################################### 0 components verified in 0 directives execution time: 0.30s ################################################################################
We should have an API option to allow to use that output.
Updated by François ARMAND almost 8 years ago
And sometimes, I don't get anything at all:
== [fanf@luhman16] == % curl -k -H "X-API-Token: 5qsPVwcoa99sZfnSn3A6ive9Q7PMUzRx" -X POST 'https://192.168.44.2/rudder/api/latest/nodes/c867b070-0721-43d3-8825-d78c51c2c632/applyPolicy'
Updated by François ARMAND over 7 years ago
- Assignee changed from Benoît PECCATTE to Nicolas CHARLES
Updated by François ARMAND over 7 years ago
- Translation missing: en.field_tag_list set to Blocking 4.1
Updated by Nicolas CHARLES over 7 years ago
Webapp log show following error:
java.lang.NumberFormatException: For input string: "Error when trying to contact internal remote-run API: null" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Byte.parseByte(Byte.java:149) at java.lang.Byte.parseByte(Byte.java:175) at scala.collection.immutable.StringLike.toByte(StringLike.scala:297) at scala.collection.immutable.StringLike.toByte$(StringLike.scala:297) at scala.collection.immutable.StringOps.toByte(StringOps.scala:29) at com.normation.rudder.web.rest.node.NodeApiService8.runResponse(NodeAPIService8.scala:119) at com.normation.rudder.web.rest.node.NodeApiService8.$anonfun$runNode$4(NodeAPIService8.scala:155) at com.normation.rudder.web.rest.node.NodeApiService8.$anonfun$runNode$4$adapted(NodeAPIService8.scala:155) at net.liftweb.http.LiftServlet.sendResponse(LiftServlet.scala:1040) at net.liftweb.http.LiftServlet.doService(LiftServlet.scala:451) at net.liftweb.http.LiftServlet.$anonfun$service$2(LiftServlet.scala:157) at net.liftweb.util.TimeHelpers.calcTime(TimeHelpers.scala:427) ... at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:369) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:464) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:913) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:975) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:641) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:231) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745)
Updated by Nicolas CHARLES over 7 years ago
Ha, this message is probably more relevant:
Feb 28 12:02:32 server cf-serverd[1583]: CFEngine(server) rudder 127.0.0.1> Connection was hung up while receiving line: Feb 28 12:02:32 server cf-serverd[1583]: CFEngine(server) rudder 127.0.0.1> Client closed connection early! He probably does not trust our key...
Updated by Nicolas CHARLES over 7 years ago
Full verbose is
rudder verbose: Obtained IP address of '127.0.0.1' on socket 7 from accept rudder verbose: New connection (from 127.0.0.1, sd 7), spawning new thread... rudder info: 127.0.0.1> Accepting connection rudder verbose: 127.0.0.1> Setting socket timeout to 600 seconds. rudder verbose: 127.0.0.1> Peeked nothing important in TCP stream, considering the protocol as TLS rudder verbose: 127.0.0.1> TLS version negotiated: TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3 rudder verbose: 127.0.0.1> TLS session established, checking trust... rudder verbose: 127.0.0.1> Remote peer terminated TLS session (SSL_read) error: 127.0.0.1> Connection was hung up while receiving line: notice: 127.0.0.1> Client closed connection early! He probably does not trust our key..
Updated by Nicolas CHARLES over 7 years ago
Agent side:
rudder verbose: Connected to host 192.168.41.2 address 192.168.41.2 port 5309 (socket descriptor 4) rudder verbose: TLS version negotiated: TLSv1.2; Cipher: AES256-GCM-SHA384,TLSv1/SSLv3 rudder verbose: TLS session established, checking trust... rudder verbose: Did not find new key format '/var/rudder/cfengine-community/ppkeys/root-MD5=57ccba22df018012132877618ff655f9.pub' rudder verbose: Trying old style '/var/rudder/cfengine-community/ppkeys/root-192.168.41.2.pub' rudder verbose: Received key 'MD5=57ccba22df018012132877618ff655f9' not found in ppkeys error: TRUST FAILED, server presented untrusted key: MD5=57ccba22df018012132877618ff655f9 rudder verbose: Connection to 192.168.41.2 is closed error: Failed to connect to host: 192.168.41.2
Updated by Nicolas CHARLES over 7 years ago
- we cannot remote run on itself ; cf-runagent doesn't seem to support it
- remote running a 4.1 node is ok
- remote running a 4.0 node fails, as command is not valid
root@agent1:/home/vagrant# /opt/rudder/bin/rudder agent run -uR -I -Dcfruncommand Rudder agent 4.0.4.rc1.git201702280322 (CFEngine Core 3.7.4) Node uuid: e04cdc24-2180-4d2e-b334-0445a13a3a45 ok: Rudder agent promises were updated. error: Remote execution cannot ignore locks
- remote running a 3.1 node fails, as command /opt/rudder/bin/rudder agent run -uR -I -Dcfruncommand --inform is not valid
root@agent2:/home/vagrant# /opt/rudder/bin/rudder agent run -uR -Dcfruncommand --inform /opt/rudder/share/commands/agent-run : option non permise -- u /opt/rudder/share/commands/agent-run : option non permise -- - /opt/rudder/share/commands/agent-run : option non permise -- n /opt/rudder/share/commands/agent-run : option non permise -- o Rudder agent 3.1.19.rc1.git201702210714 (CFEngine Core 3.6.5) Node uuid: 791a6ebe-cfb1-4f54-b9a2-48ca162f64b6 2017-02-28T12:38:05+0000 error: Remote execution cannot ignore locks
So, a remote API should not try to remote run on local system, and we need a fix for 4.0 and 3.1 compatibility
Updated by François ARMAND over 7 years ago
- Translation missing: en.field_tag_list deleted (
Blocking 4.1) - Category set to Relay server or API
- Assignee changed from Nicolas CHARLES to Benoît PECCATTE
Updated by François ARMAND over 7 years ago
I'm letting that ticket open to change Relay API and do the correct call to rudder agent. I'm opening a subticket to correct the null pointer exception on rudder side that should not happen.
Updated by Alexis Mousset over 7 years ago
- Status changed from New to In progress
- Assignee changed from Benoît PECCATTE to Alexis Mousset
Updated by Alexis Mousset over 7 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Alexis Mousset to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-packages/pull/1270
Updated by Alexis Mousset over 7 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder-packages|b6ca5a9f94c4c33be1a2649e5489dbf9e1f53dff.
Updated by Nicolas CHARLES over 7 years ago
- Related to User story #10314: Document remote-run exec compatibility added
Updated by François ARMAND over 7 years ago
- Subject changed from Remote-run exec for root and nodes behind relays fail with "rudder agent was interrupted" to Remote-run exec for root fail with "rudder agent was interrupted"
Updated by Vincent MEMBRÉ over 7 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 4.1.0~rc1 which was released today.
- 4.1.0~rc1: Announce Changelog
- Download: https://www.rudder-project.org/site/get-rudder/downloads/