Bug #2846
closedDuring the use of rudder-init.sh, jetty need to be stopped but the operation time infinite time
Description
Sometimes, the command "/etc/init.d/jetty stop" from the rudder-init.sh script wait (sleep) indefinitely.
1643 ? Ss 0:00 \_ sshd: root@notty 1645 ? Ss 0:00 | \_ /bin/bash /tmp/script-rudder-snapshot.sh 5150 ? S 0:00 | \_ /bin/bash /opt/rudder/bin/rudder-init.sh rudder-snapshot-2.4.normation.com no yes yes 192.168.0.0/24 5287 ? Ss 0:01 | \_ /opt/rudder/sbin/cf-agent 7062 ? S 0:00 | \_ sh -c /etc/init.d/jetty restart </dev/null >/dev/null 2>/dev/null 7063 ? S 0:00 | \_ bash /etc/init.d/jetty restart 7086 ? S 0:20 | \_ bash /etc/init.d/jetty stop 26593 ? S 0:00 | \_ sleep 1 [...] 7400 ? Sl 1:00 /usr/lib/jvm/java-6-sun/bin/java -server -Xms1024m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m [...]
The jetty is unblocked when I kill the java process (here, at PID 7400).
Updated by Jonathan CLARKE over 12 years ago
- Assignee deleted (
Nicolas PERRON) - Priority changed from 1 (highest) to 4
- Target version changed from 2.4.0~beta4 to 2.4.0~rc1
This init script should be changed to test for the application running until a TIMEOUT is reached. Then, we should force kill it, similarly to the rudder-agent init script.
Updated by Jonathan CLARKE over 12 years ago
- Status changed from New to 2
- Priority changed from 4 to 3
Updated by Jonathan CLARKE over 12 years ago
- Target version changed from 2.4.0~rc1 to 2.4.0~rc2
Updated by Nicolas PERRON over 12 years ago
- Status changed from 2 to Discussion
Jonathan CLARKE wrote:
This init script should be changed to test for the application running until a TIMEOUT is reached. Then, we should force kill it, similarly to the rudder-agent init script.
In fact, a timeout of 30 seconds already exist in the stop process of /etc/init.d/jetty:
[...] TIMEOUT=30 while running "$JETTY_PID"; do if (( TIMEOUT-- == 0 )); then start-stop-daemon -K -p"$JETTY_PID" -d"$JETTY_HOME" -a "$JAVA" -s KILL fi sleep 1 done [...] TIMEOUT=30 while running $JETTY_PID; do if (( TIMEOUT-- == 0 )); then kill -KILL "$PID" 2>/dev/null fi sleep 1 done [...]
I have seen this issue and know that it exist although I can't reproduce it. I don't know how to deal with this problem.
Updated by Nicolas PERRON over 12 years ago
- Target version changed from 2.4.0~rc2 to 2.4.0~rc1
Updated by Nicolas PERRON about 12 years ago
- Assignee changed from Nicolas PERRON to Jonathan CLARKE
Jon, I don't know how to deal with this problem.
jetty seems to hang on randomly when launched by rudder-init.sh but I can't reproduce this bug. When the bug is met, the only action I can make is to kill "bash /etc/init.d/jetty stop" process.
As explained above, in the /etc/init.d/jetty file, a TIMEOUT is already defined. The only thing which is odd is that "*/etc/init.d/jetty restart*" call "*/etc/init.d/jetty stop*" instead of using a bash function stop().
Updated by Jonathan CLARKE about 12 years ago
- Assignee changed from Jonathan CLARKE to Nicolas PERRON
Looking at this code:
TIMEOUT=30 while running "$JETTY_PID"; do if (( TIMEOUT-- == 0 )); then start-stop-daemon -K -p"$JETTY_PID" -d"$JETTY_HOME" -a "$JAVA" -s KILL fi sleep 1 doneIs seems that although there is a "timeout" variable, it is not a real timeout, just a countdown. Let's read through the code to see what is happening:
- An while loop will continue looping until the command "running $JETTY_PID" doesn't return 0.
- The command "running $JETTY_PID" returns -1 when that PID does not match any currently running processes... not exactly a clear definition of "Jetty is stopped".
- If this loops 30 times, and the TIMEOUT variable is exactly equal to 0, then the script sends a KILL signal to that PID
- In case the PID still exists, the TIMEOUT continues to be decremented, but the KILL signal is never sent again. This can "easily" (given the right circumstances) end up in an infinite loop: all we need is for the KILL signal not to be sent, or not be effective, and the PID to be reused by another process, and hey presto we're in an infinite loop.
When I originally suggested implementing a timeout, I meant a timeout after which we give up. Look at this extract from our rudder-agent init script:
i=1 while [ -e /proc/$PID ] do if [ $i -eq $TIMEOUT ] then # Timeout message "alert" "[ALERT] ${CFENGINE_COMMUNITY_NAME[$daemon]} still running (PID $PID), try: $0 forcestop" exit 1 fi i=`expr $i + 1` sleep 1 done
I suggest that we add an if statement to each of the while loops you quoted above, like this:
if (( TIMEOUT < -10 )); then echo "Failed to stop Jetty. Giving up." break fi
Updated by Nicolas PERRON about 12 years ago
- Status changed from Discussion to In progress
Jonathan CLARKE wrote:
Looking at this code:
[...]
Is seems that although there is a "timeout" variable, it is not a real timeout, just a countdown. Let's read through the code to see what is happening:
- An while loop will continue looping until the command "running $JETTY_PID" doesn't return 0.
- The command "running $JETTY_PID" returns -1 when that PID does not match any currently running processes... not exactly a clear definition of "Jetty is stopped".
- If this loops 30 times, and the TIMEOUT variable is exactly equal to 0, then the script sends a KILL signal to that PID
- In case the PID still exists, the TIMEOUT continues to be decremented, but the KILL signal is never sent again. This can "easily" (given the right circumstances) end up in an infinite loop: all we need is for the KILL signal not to be sent, or not be effective, and the PID to be reused by another process, and hey presto we're in an infinite loop.
When I originally suggested implementing a timeout, I meant a timeout after which we give up. Look at this extract from our rudder-agent init script:
[...]I suggest that we add an if statement to each of the while loops you quoted above, like this:
[...]
Ok, I understand now.
Your analyze and solutions seems clear to me, I agree. I will add a patch to our packaging in order to add theses if statement.
Updated by Nicolas PERRON about 12 years ago
- Status changed from In progress to Pending technical review
- % Done changed from 0 to 100
Applied in changeset commit:3fe72c583bcc2a563bef9d0dba695c41d0d33d91.
Updated by Jonathan CLARKE about 12 years ago
- Status changed from Pending technical review to Released
Looks good to me, thanks Nico.
Updated by Nicolas PERRON almost 12 years ago
- Project changed from Rudder to 34
- Category deleted (
11)
Updated by Benoît PECCATTE almost 10 years ago
- Project changed from 34 to Rudder
- Category set to Packaging