Project

General

Profile

Actions

Bug #18955

closed

Fatal exception doesn't cause rudder to stop anymore

Added by François ARMAND almost 4 years ago. Updated over 3 years ago.

Status:
Released
Priority:
N/A
Category:
System integration
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
To do
Fix check:
Error - Next version
Regression:

Description

RESOLUTION NOTE: the provided patch use the OpenJDK boot parameter "CrashOnOutOfMemoryError" that is only available since OpenJDK 1.8.0_92. You should really use a more recent OpenJDL than that for performance and security reasons, but if you are stuck with a 6 years old runtime, you can modify the file /opt/rudder/etc/rudder-jetty.conf and remove it.

In #14281 we added an handler for fatal exception so that they force rudder to stop (because from that point, rudder is broken in unknown way).

But now, rudder doesn't stop on them:

[2021-02-24T15:27:59.413Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:28:40.190Z] ERROR FATAL exception in thread 'pool-2-thread-14' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:29:01.906Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:30:02.031Z] ERROR FATAL exception in thread 'JettyShutdownThread' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:32:26.929Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:34:25.738Z] ERROR FATAL exception in thread 'Health Check Thread for LDAPConnectionPool(serverSet=SingleServerSet(server=localhost:389, includesAuthentication=false, includesPostConnectProcessing=false), maxConnections=2)' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:32:21.339Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:34:25.737Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:34:25.748Z] ERROR FATAL exception in thread 'zio-default-blocking-42' (in threadgroup 'zio-default-blocking'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
[2021-02-24 15:34:25+0000] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:34:25.747Z] ERROR FATAL exception in thread 'Connection reader for connection 141 to localhost:389' (in threadgroup 'zio-default-blocking'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:34:25.766Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:34:25.766Z] ERROR FATAL exception in thread 'pool-2-thread-12' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
[2021-02-24 15:32:21+0000] WARN  com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=1m37s705ms416µs741ns).
[2021-02-24 15:34:55+0000] INFO  inventory-processing - Report 'era-afd88dc6-14ef-4782-b50f-e1db3f442449.ocs' for node 'era.ad.cullinanstudio.com' [afd88dc6-14ef-4782-b50f-e1db3f442449] (signature:certified) processed in 11 minutes, 1 second and 965 milliseconds ms
[2021-02-24 15:34:55+0000] WARN  com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=5m37s929ms914µs588ns).
[2021-02-24 15:34:56+0000] INFO  inventory-processing - Report 'cs-lxd-843bde7c-7c54-45ea-aafa-545375fa1763.ocs' for node 'cs-lxd.ad.cullinanstudio.com' [843bde7c-7c54-45ea-aafa-545375fa1763] (signature:certified) processed in 1 second and 263 milliseconds ms
[2021-02-24 15:39:26+0000] ERROR report.cache - Error when updating compliance cache for nodes: [83df0302-c646-48d4-960c-206eb7d6e76b, 6a7065bd-2857-44a5-a793-299aaf73f877, 64a9fb85-eb95-4e77-b473-bb36d6c755ce, 87257904-cd38-4f26-8bb7-86a95b616b31, 2f8a8a3e-e61e-493f-845b-411efd5e5cc1, eb746532-5c6d-41e9-a2cd-9cb5c359f754, dc4e6ebb-facd-4782-92e7-507c389c1d55, 98524e4d-bbe9-4cc5-a73a-6221ad316000, bc1f48e7-40ed-424f-864e-a36b0f600823, d4632740-3096-4a0f-a960-c58edcfe153f, 3fa39ba8-6134-4d15-a19c-47757443a3e6, ca75c6a1-3dcf-4f1c-a940-458d41955731, b01da0ae-a85e-43c6-90e2-0a97f6b962a9, 431d6b5b-3a63-491b-a545-dcd00149e74a, 0164debd-801f-4dbf-8286-b91f961db385, 012a5fa4-5cae-4a0f-aff0-8cef73605914, 68bfec3d-3148-48da-8936-00d6c6353bb8, 9d7a119e-dae5-401e-bd59-955189442d6e, 801ffa9b-da14-4f3a-9d79-7e25fff83a8b, 68eac959-2a13-48ff-9743-fb189da3751d, dd5da446-4465-4412-898d-d5fad325b057, 29bcf98b-7a98-4fd9-9c65-ca6f558523a1, 72d7d23a-49c3-40a9-b19e-ccca54f977b4, 1c4f8f91-8acf-4041-9182-15b47162c91f, ea1c1a5e-b035-4b30-af11-343d1ca80ac6, 16312315-eaf7-48f2-a738-2a8cc1a18277, 153b1a15-d18a-4382-99c5-cfe28c32040c, e6c5b022-531d-4ac7-ab42-074401be8b29, root, 6abcaf43-a53f-495c-92d9-092ae732f8b7, ebf278c9-18ae-4dc3-a860-0d4db34c0bed, 8651f72c-e34b-4fe9-9fc1-299faeb4f624, a32f001d-1eaf-41ad-bf32-e548e0f796b5, fdd9476f-7b5e-44f1-a0ad-71578250eacc, 900ecf39-cfd5-44e0-9d1d-1d539694fd05, 510bda45-631a-4708-a98e-5db91ee31f49, 7e9d00c1-949c-4666-b102-1bb3d7cb9d87, 193e9378-731b-4246-9a69-251b05f537c8, fff40e4d-694f-485e-b2c0-bacc43b8b482, 020dfd93-295d-408f-b934-34cee86d882f, 3ce75c4f-b041-4c15-877b-e3d441d83bd8, ad063ee1-6a7e-4edc-a028-f36b76cc17ea, bad452a7-36e7-4956-b29b-d2f58d2ad547, e27a8c6e-c5f2-480f-a7c7-05b1e271e078, 4b7b1e60-ee4d-44db-9409-f12b9f9de751, 0f18bc42-e1cc-4ee4-a0ef-b376eef08806, ef503e99-b19d-4a4a-aadc-fd57c64fcfa5, e2f589ae-d1c1-4a99-b67c-57d439f0efed, 2da6b244-019b-4cc4-831e-f0aa0d7c7eba, eac1ddac-df1a-4f18-8760-053a29b3ec57, f80fb2df-107a-4b1c-a1a7-af935b3d6dd2, 93b2bf80-9e3f-47ee-8a45-f4a570f5ea00, 39461107-981b-4e04-aabc-468d9f7cf7f0, afd88dc6-14ef-4782-b50f-e1db3f442449, a820f4cb-5526-4556-826b-d0c9f5f9881e, f6d08522-9cdf-4c40-8845-ef8655db964e, 292d1824-6688-422f-8cff-5cab708ab3f3, 801eafab-0542-4c6a-842b-956adcab89f4, 0176c790-c2f6-4de9-b231-d1b1e23285ac, 00404802-6de4-4920-8cbd-7471670d4392, 1a12816a-b2c4-4d68-a9bd-f371223176d0, 88f6ae7c-a27a-46ee-b410-e0e7fceb0822, 9f439b06-3f86-4828-8161-e5265d201239, acf230f8-914f-4c53-a061-fc3e1994bf92, 6a2f7f67-2910-46ad-9652-ddcc30a3ba67, 8f20e32d-4d4e-4e72-a3fa-184e0a9bfe5c, 31c74e40-5b60-4b79-a341-975a9bb6832b, 106e41c6-2098-4fa0-8305-5f7f2c99245a, 99cc43a8-8b0a-4ffb-844f-65af399097f2, 84b7e2eb-1bf6-4c01-bd9d-e83628fcb372, 6d794d33-9819-4c90-94bd-83e9546c60f6, c6bafff0-fda3-46f2-936e-11a11da8500d, 292e684f-a0f6-4e3c-ab41-860874800576, 0eab4144-d059-43f5-8a96-9779448a4eae, 4e29d7c7-b6bf-4b6b-949b-78a842ab4c26, f87c6d7d-2eec-4ef9-95d2-7a7bc4e80c98, 1a028fe7-ecfd-43c7-b34c-074275e5de7f, 04f67535-18e2-44bc-8343-8bd154bbe195, aa8f328a-f550-4767-b74d-6f47893d3efe, 91a299f2-0723-46a2-ba10-3116878d119e, e9932b2f-16bf-4e35-a7f2-ed04e2764ca6, 79782fa3-28a5-4d6a-bf5c-528adb5bc4e7, 71f11978-ba06-48bb-87bd-ab0c0baf1e1a, 0be487b0-e2dd-4e60-a2e9-86e831a7b8d3, 97a2ada6-6630-4f3a-87ed-694bb00be07d, 23c67ca3-6aae-4e26-90db-07a849490eb7, 232773fc-d6c4-4b95-ae20-fe2d15cbdd7e, 8c422b5d-a571-41be-b9c0-1bf54efb1ca9, 0bc1f85d-026a-4ef2-bb92-a204f95593e6, d93a4b3f-7739-4e3e-a3f4-f0fa1638c99d, f0f7072c-bd1f-48e9-b5ed-27143cad87e3, cbb05021-3ad9-4639-ae2b-cc89d15359f9, 4d2857dd-1b97-4242-ac78-3901d06959d1, 9336448a-3171-4da6-8d7e-8111aaa86b03, 8046ff06-bb9c-4fc2-ad9d-aa89fcab87e1, 283cc90a-7397-49c9-a668-16664f6eb83f, 6906cbf2-832f-4443-8b16-f8fdbdc454d2, 73134710-f1be-44be-8358-791a01cb5cdc, fcf0fa40-d809-4afa-8f54-1163ff33acab, b51dd2a6-a9c3-4c13-b00d-0185acb3cc11, d8d4680f-61b3-4b38-bb2b-f049a27ef39e, 3cfa5756-08a7-4321-9684-8f40f4f33258, 50641570-753c-49a2-83d9-fc29cf06ab4c, adf8aa38-7cdf-4502-a8de-3632bfeadbc7, 8dbf051d-7a95-4e80-9625-1f8df5036b90, b6775c30-8c93-4dd3-a1b2-c3c6a1fae5fc, 774934cf-4014-4ff5-ad80-ab19467720a9, 0e2e1829-e56c-4e63-9580-27e1db0d5992, e908812a-f8d8-45c7-9149-60df0069b772, a57c75c5-7fe7-4550-8d17-1a291430db5e, 890bfdfe-8564-4ec6-9143-3cfe3a7f3bb7, 4ee13920-8ff7-4f0b-ba87-d98cc47b4420, b5cc63b7-4fc6-43da-b7d4-2ae6087d2c92, fdfe3b9d-eb1c-4d45-998a-344e284e0134, 5f10e7a2-c9ce-4410-91c7-577ca9e688d7, 83420577-99d5-4c12-8332-c46c9d2d8cd4, ad4cafd6-29a3-4905-8a46-8cb62c3aac47, f542e332-18e7-45c7-a647-6819341250c4, ea547246-2597-49e8-abc0-b06c426a4271, 45a4be80-0758-4b15-b331-74582b1c6704, ed17924f-3385-4afe-86a7-73ff06640b75, 70027876-60e7-41b3-9f75-c9329153e8d6, 56a99b2e-3020-4d3b-a8bc-ace60b1a4327, 2da1f24b-e5ba-49e0-a405-46208ee2f32d, bbb33619-df86-435d-bda3-8c1d6a6e5e89, 7c9a0647-bc50-4575-80b3-eec19848fff5, e0c854fa-50ec-41ce-9bf3-614b9adc25ec, 88031eca-e2e7-450a-af85-839822971164, 08e0ca64-1e37-41a0-99d1-120c94ae1c91, 7afa0b5e-07e0-4238-9707-e6361414f065, f1dfa384-00bd-4643-be7d-462395f5e3f6, 37eb69d5-2d19-48c9-97d6-29b27dec299c, 843bde7c-7c54-45ea-aafa-545375fa1763, 78d79f85-c368-47a3-b231-5b0783aea27b, b440bb48-e214-4932-a3f9-160a79417ca3, 65266c5d-0108-4fe0-99a1-f824c470e0b5, 95caeed3-cda4-4f4c-b9b9-2e5c7a97ab6b, 79125b40-2f81-42c4-8383-bc8f012db906, 325676e7-dda7-473e-afa2-d6d3d09e7217, 465844a0-6b4e-41d8-bc63-f7780853619a, b0b12eb4-ffa6-4fb7-ab6f-eeb1f085a66e, 37c6ccd0-14b6-4e7b-a1ef-5b5fde24d1be, 8b7128a2-3f50-4e1f-b92b-fb52a081e408, f33caf20-fd02-4e01-b7e7-138e0518d1b6, aafb541c-e0d0-4345-8d5c-54856f22eec1, d585f3ad-297f-4b97-9d56-58920657636a, aca34e01-3e03-46f3-9a4b-ed653af6416a, 04cb40de-0bd5-4d51-bc15-e6f527894409, 537fea5d-58f6-4d69-97ad-eabbdc06e0f9, a2ebface-5ad6-4f08-ace9-332c3a249636, 168d953c-32e7-452a-b654-7f70ae80dca9, e31ae0d9-a864-4ee1-b462-a81a79236445, 501e4d17-e443-478f-ba42-11ae299ed391, 8befd795-d361-4e1f-bef8-69af8167780a, 8d5c514c-c8a0-4a26-b8e0-0cc79fb1e27e, 47759878-2d1b-4f20-b504-6037fb817034, b8d40ac5-2943-4fff-9c2c-241048a5bf05, 18b3ca8c-c3e1-4e48-994a-e52bc5f4ba2f, 12ee5b1a-c0eb-4a7f-b87c-d229b5c631b9, 0750cc88-0cea-4276-bd86-3616dfb8bb04, 68ca441c-18ad-46ca-a46b-8cf0c6373bc4, beb99eda-275c-452c-a53e-2d35c22b7da3, 6e7b9aef-250e-41dc-afb5-c78d61e19f30, 49d4be0c-887d-458f-b97b-3e1d44783ffd, 8cee109d-f58f-4d1d-8666-15d6b03df745, cc90310c-c8b6-4280-8613-29b9ceaab0fa, a8d1cf92-9e51-4253-a848-de542607e524, 7744604b-f46f-4656-8217-090cd6042603, b5d9a984-2077-4430-9d73-85245e4efca4, a53426ce-4dcb-4278-b6c2-c4d1fffe0b74, f9867bcc-2b1f-4e3e-a75d-a638768df723, 843803c8-9916-4709-bc8d-e029cd7d4604, d2868f4e-7a30-4910-af9c-f2c4eccd399b]: Unexpected: BackendException: Error during search ou=Application Properties,cn=rudder-configuration SUB: A client-side timeout was encountered while waiting 300000ms for a response to search request with message ID 288, base DN 'ou=Application Properties,cn=rudder-configuration', scope SUB, and filter '(objectClass=property)' from server localhost:389.; cause was: com.unboundid.ldap.sdk.LDAPSearchException: A client-side timeout was encountered while waiting 300000ms for a response to search request with message ID 288, base DN 'ou=Application Properties,cn=rudder-configuration', scope SUB, and filter '(objectClass=property)' from server localhost:389. 
 -> com.normation.ldap.sdk.RoLDAPConnection.$anonfun$search$1(LDAPConnection.scala:321)
[2021-02-24 15:39:26+0000] WARN  explain_compliance.2da6b244-019b-4cc4-831e-f0aa0d7c7eba - Received a run at 2021-02-24T15:20:55.000Z for node '2da6b244-019b-4cc4-831e-f0aa0d7c7eba' configId '20200826-084123-9fa3d540' which is not known by Rudder, and that node should be sending reports for configId 20210224-115717-25e62ed
[2021-02-24 15:39:26+0000] WARN  explain_compliance.292d1824-6688-422f-8cff-5cab708ab3f3 - Received a run at 2021-02-24T15:24:19.000Z for node '292d1824-6688-422f-8cff-5cab708ab3f3' configId '20200826-100925-ac5525b9' which is not known by Rudder, and that node should be sending reports for configId 20210224-115717-fc114038
....

Subtasks 1 (0 open1 closed)

Bug #18961: typo in parent ticketReleasedFrançois ARMANDActions

Related issues 1 (0 open1 closed)

Related to Rudder - Bug #18982: on JVM8, there's no OOM logRejectedFrançois ARMANDActions
Actions #1

Updated by François ARMAND almost 4 years ago

  • Status changed from New to In progress
Actions #2

Updated by François ARMAND almost 4 years ago

The problem seems to be that in the "catch" part, there is a e.printStackTrace() call, which requires memory allocation, and so, fails.

The only correct way to handle OOM error is with the built-in jvm parameter -XX:+CrashOnOutOfMemoryError.

Actions #3

Updated by François ARMAND almost 4 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Vincent MEMBRÉ
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/2431
Actions #4

Updated by François ARMAND almost 4 years ago

  • Status changed from Pending technical review to Pending release
Actions #5

Updated by Nicolas CHARLES almost 4 years ago

on my test system, it stopped directly rudder, it didn't even have a chance to log the error

Actions #6

Updated by Nicolas CHARLES almost 4 years ago

journalctl helpfully told me

Mar 05 10:15:05 server su[898]: pam_unix(su:session): session opened for user root by vagrant(uid=0)
Mar 05 10:15:05 server systemd-coredump[773]: Process 22824 (java) of user 0 dumped core.

                                              Stack trace of thread 16893:
                                              #0  0x00007f6b7b0b293f raise (libc.so.6)
                                              #1  0x00007f6b7b09cc95 abort (libc.so.6)
                                              #2  0x00007f6b7a21e9df _ZN2os5abortEb.cold.54 (libjvm.so)
                                              #3  0x00007f6b7ab91163 _ZN7VMError14report_and_dieEv (libjvm.so)
                                              #4  0x00007f6b7a4fbdf3 _Z25report_java_out_of_memoryPKc (libjvm.so)
                                              #5  0x00007f6b7ab53948 _ZN14TypeArrayKlass15allocate_commonEibP6Thread (libjvm.so)
                                              #6  0x00007f6b7aa04720 _ZN11OptoRuntime11new_array_CEP5KlassiP10JavaThread (libjvm.so)
                                              #7  0x00007f6b6506d107 n/a (n/a)
<pre>

Actions #7

Updated by Nicolas CHARLES almost 4 years ago

i do have a /opt/rudder/etc/rudder-jetty-base/hs_err_pid22824.log , but that's not so easy to spot

Actions #8

Updated by Nicolas CHARLES almost 4 years ago

  • Fix check changed from To do to Error - Next version
Actions #9

Updated by François ARMAND almost 4 years ago

  • Description updated (diff)
Actions #10

Updated by Vincent MEMBRÉ almost 4 years ago

This bug has been fixed in Rudder 6.1.10 and 6.2.3 which were released today.

Actions #11

Updated by Vincent MEMBRÉ almost 4 years ago

  • Related to Bug #18982: on JVM8, there's no OOM log added
Actions #12

Updated by Vincent MEMBRÉ over 3 years ago

This bug has been fixed in Rudder 6.1.10 and 6.2.3 which were released by the end of October 2020.

Actions #13

Updated by Vincent MEMBRÉ over 3 years ago

  • Status changed from Pending release to Released
Actions

Also available in: Atom PDF