Bug #16004: Unexpected crash of webapp on 6.0 - Rudder - Issue Tracker

Actions

Copy link

Bug #16004

closed

Unexpected crash of webapp on 6.0

Added by Nicolas CHARLES over 5 years ago. Updated over 5 years ago.

Status:

Released

Priority:

N/A

Assignee:

François ARMAND

Category:

Target version:

6.0.0~beta1

Pull Request:

https://github.com/Normation/rudder/p...

Severity:

Major - prevents use of part of Rudder | no simple workaround

UX impact:

User visibility:

Effort required:

Priority:

Name check:

To do

Fix check:

To do

Regression:

Description

I had a fresh 6.0 install, worked fined. I let it runs without doing anything, ~30min later, I came back and found my webapp stuck and not able to click on any buttons.
Webapp logs where displaying lot of messages like:

[2019-10-18 11:21:28+0200] INFO policy.generation - Successful policy update '3' [started 2019-10-18 11:21:26 - ended 2019-10-18 11:21:28]
[2019-10-18 11:41:43+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, request timed out after 30000ms.
[2019-10-18 11:41:43+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30006 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists.
[2019-10-18 11:41:59+0200] WARN scheduledJob - Error when trying to clean log reports from report table. <- An error occured while deleting log reports <- HikariPool-1 - Connection is not available, request timed out after 30000ms.
[2019-10-18 11:42:01+0200] ERROR scheduledJob - could not fetch last id, don't log anything, wait next run, cause is Error when parsing property 'reportLoggerLastId'
[2019-10-18 11:42:18+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, request timed out after 30000ms.
[2019-10-18 11:42:18+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30002 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists.
[2019-10-18 11:42:53+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, request timed out after 30000ms.
[2019-10-18 11:42:53+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30002 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists.
[2019-10-18 11:43:28+0200] ERROR report - Could not get node executions reports from the RudderSysEvents table cause is: Could not get node execution status <- Error when retrieving 'executionStatus' from db: HikariPool-1 - Connection is not available, request timed out after 30000ms.
[2019-10-18 11:43:28+0200] WARN application - [Store Agent Run Times] Task frequency is set too low! Last task took 30003 ms but tasks are scheduled every 5000 ms. Adjust rudder.batch.storeAgentRunTimes.updateInterval if this problem persists.

I had to reboot the VM, allow 2 more cpus and now eveything is working fine again. I have no idea if it was the reboot or the cpu allocation that fixed it.

Given the error messages, it looks like the database wasn't repsonding anymore. It may have been due to the system being under load or b/c of some thread leak (when threads accumulate, the eat all CPU just for switching between them, which is not great).

If the problem happen again, before stopping the JVM, can you do a threadump?

To get a threadump, you need a jdk (openjdk-development) to get jstack command, and then:

jstack -l  <jvm-pid> > <file-path>

Files

dump_log (290 KB) dump_log

Nicolas CHARLES, 2019-10-21 12:22

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Rudder

Custom queries

Bug #16004

Unexpected crash of webapp on 6.0

Updated by Nicolas CHARLES over 5 years ago

Updated by Nicolas CHARLES over 5 years ago

Updated by Nicolas CHARLES over 5 years ago

Updated by Nicolas CHARLES over 5 years ago

Updated by Nicolas CHARLES over 5 years ago

Updated by Vincent MEMBRÉ over 5 years ago