Bug #16004
closedUnexpected crash of webapp on 6.0
Description
I had a fresh 6.0 install, worked fined. I let it runs without doing anything, ~30min later, I came back and found my webapp stuck and not able to click on any buttons.
Webapp logs where displaying lot of messages like:
[2019-10-18 11:21:28+0200] INFO policy.generation - Successful policy update '3' [started 2019-10-18 11:21:26 - ended 2019-10-18 11:21:28] [2019-10-18 11:41:43+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, request timed out after 30000ms. [2019-10-18 11:41:43+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30006 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists. [2019-10-18 11:41:59+0200] WARN scheduledJob - Error when trying to clean log reports from report table. <- An error occured while deleting log reports <- HikariPool-1 - Connection is not available, request timed out after 30000ms. [2019-10-18 11:42:01+0200] ERROR scheduledJob - could not fetch last id, don't log anything, wait next run, cause is Error when parsing property 'reportLoggerLastId' [2019-10-18 11:42:18+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, request timed out after 30000ms. [2019-10-18 11:42:18+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30002 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists. [2019-10-18 11:42:53+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, request timed out after 30000ms. [2019-10-18 11:42:53+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30002 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists. [2019-10-18 11:43:28+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, request timed out after 30000ms. [2019-10-18 11:43:28+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30003 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists.
I had to reboot the VM, allow 2 more cpus and now eveything is working fine again. I have no idea if it was the reboot or the cpu allocation that fixed it.
Given the error messages, it looks like the database wasn't repsonding anymore. It may have been due to the system being under load or b/c of some thread leak (when threads accumulate, the eat all CPU just for switching between them, which is not great).
If the problem happen again, before stopping the JVM, can you do a threadump?
To get a threadump, you need a jdk (openjdk-development) to get jstack command, and then:
jstack -l <jvm-pid> > <file-path>
Files
Updated by Nicolas CHARLES about 5 years ago
- Status changed from New to In progress
- Assignee set to Nicolas CHARLES
Updated by Nicolas CHARLES about 5 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Nicolas CHARLES to François ARMAND
- Pull Request set to https://github.com/Normation/rudder/pull/2540
Updated by Nicolas CHARLES about 5 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder|c4b4fbbdce0f002d6bb60230555b6fcc03bfa27e.
Updated by Vincent MEMBRÉ about 5 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 6.0.0~beta1 which was released today.