Project

General

Profile

Bug #18955

Fatal exception doesn't cause rudder to stop anymore

Added by François ARMAND about 1 month ago. Updated 24 days ago.

Status:
Pending release
Priority:
N/A
Category:
System integration
Target version:
Severity:
User visibility:
Effort required:
Priority:
0

Description

RESOLUTION NOTE: the provided patch use the OpenJDK boot parameter "CrashOnOutOfMemoryError" that is only available since OpenJDK 1.8.0_92. You should really use a more recent OpenJDL than that for performance and security reasons, but if you are stuck with a 6 years old runtime, you can modify the file /opt/rudder/etc/rudder-jetty.conf and remove it.

In #14281 we added an handler for fatal exception so that they force rudder to stop (because from that point, rudder is broken in unknown way).

But now, rudder doesn't stop on them:

[2021-02-24T15:27:59.413Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:28:40.190Z] ERROR FATAL exception in thread 'pool-2-thread-14' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:29:01.906Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:30:02.031Z] ERROR FATAL exception in thread 'JettyShutdownThread' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:32:26.929Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:34:25.738Z] ERROR FATAL exception in thread 'Health Check Thread for LDAPConnectionPool(serverSet=SingleServerSet(server=localhost:389, includesAuthentication=false, includesPostConnectProcessing=false), maxConnections=2)' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:32:21.339Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:34:25.737Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:34:25.748Z] ERROR FATAL exception in thread 'zio-default-blocking-42' (in threadgroup 'zio-default-blocking'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
[2021-02-24 15:34:25+0000] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:34:25.747Z] ERROR FATAL exception in thread 'Connection reader for connection 141 to localhost:389' (in threadgroup 'zio-default-blocking'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
[2021-02-24T15:34:25.766Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions
[2021-02-24T15:34:25.766Z] ERROR FATAL exception in thread 'pool-2-thread-12' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space'
java.lang.OutOfMemoryError: Java heap space
[2021-02-24 15:32:21+0000] WARN  com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=1m37s705ms416µs741ns).
[2021-02-24 15:34:55+0000] INFO  inventory-processing - Report 'era-afd88dc6-14ef-4782-b50f-e1db3f442449.ocs' for node 'era.ad.cullinanstudio.com' [afd88dc6-14ef-4782-b50f-e1db3f442449] (signature:certified) processed in 11 minutes, 1 second and 965 milliseconds ms
[2021-02-24 15:34:55+0000] WARN  com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=5m37s929ms914µs588ns).
[2021-02-24 15:34:56+0000] INFO  inventory-processing - Report 'cs-lxd-843bde7c-7c54-45ea-aafa-545375fa1763.ocs' for node 'cs-lxd.ad.cullinanstudio.com' [843bde7c-7c54-45ea-aafa-545375fa1763] (signature:certified) processed in 1 second and 263 milliseconds ms
[2021-02-24 15:39:26+0000] ERROR report.cache - Error when updating compliance cache for nodes: [83df0302-c646-48d4-960c-206eb7d6e76b, 6a7065bd-2857-44a5-a793-299aaf73f877, 64a9fb85-eb95-4e77-b473-bb36d6c755ce, 87257904-cd38-4f26-8bb7-86a95b616b31, 2f8a8a3e-e61e-493f-845b-411efd5e5cc1, eb746532-5c6d-41e9-a2cd-9cb5c359f754, dc4e6ebb-facd-4782-92e7-507c389c1d55, 98524e4d-bbe9-4cc5-a73a-6221ad316000, bc1f48e7-40ed-424f-864e-a36b0f600823, d4632740-3096-4a0f-a960-c58edcfe153f, 3fa39ba8-6134-4d15-a19c-47757443a3e6, ca75c6a1-3dcf-4f1c-a940-458d41955731, b01da0ae-a85e-43c6-90e2-0a97f6b962a9, 431d6b5b-3a63-491b-a545-dcd00149e74a, 0164debd-801f-4dbf-8286-b91f961db385, 012a5fa4-5cae-4a0f-aff0-8cef73605914, 68bfec3d-3148-48da-8936-00d6c6353bb8, 9d7a119e-dae5-401e-bd59-955189442d6e, 801ffa9b-da14-4f3a-9d79-7e25fff83a8b, 68eac959-2a13-48ff-9743-fb189da3751d, dd5da446-4465-4412-898d-d5fad325b057, 29bcf98b-7a98-4fd9-9c65-ca6f558523a1, 72d7d23a-49c3-40a9-b19e-ccca54f977b4, 1c4f8f91-8acf-4041-9182-15b47162c91f, ea1c1a5e-b035-4b30-af11-343d1ca80ac6, 16312315-eaf7-48f2-a738-2a8cc1a18277, 153b1a15-d18a-4382-99c5-cfe28c32040c, e6c5b022-531d-4ac7-ab42-074401be8b29, root, 6abcaf43-a53f-495c-92d9-092ae732f8b7, ebf278c9-18ae-4dc3-a860-0d4db34c0bed, 8651f72c-e34b-4fe9-9fc1-299faeb4f624, a32f001d-1eaf-41ad-bf32-e548e0f796b5, fdd9476f-7b5e-44f1-a0ad-71578250eacc, 900ecf39-cfd5-44e0-9d1d-1d539694fd05, 510bda45-631a-4708-a98e-5db91ee31f49, 7e9d00c1-949c-4666-b102-1bb3d7cb9d87, 193e9378-731b-4246-9a69-251b05f537c8, fff40e4d-694f-485e-b2c0-bacc43b8b482, 020dfd93-295d-408f-b934-34cee86d882f, 3ce75c4f-b041-4c15-877b-e3d441d83bd8, ad063ee1-6a7e-4edc-a028-f36b76cc17ea, bad452a7-36e7-4956-b29b-d2f58d2ad547, e27a8c6e-c5f2-480f-a7c7-05b1e271e078, 4b7b1e60-ee4d-44db-9409-f12b9f9de751, 0f18bc42-e1cc-4ee4-a0ef-b376eef08806, ef503e99-b19d-4a4a-aadc-fd57c64fcfa5, e2f589ae-d1c1-4a99-b67c-57d439f0efed, 2da6b244-019b-4cc4-831e-f0aa0d7c7eba, eac1ddac-df1a-4f18-8760-053a29b3ec57, f80fb2df-107a-4b1c-a1a7-af935b3d6dd2, 93b2bf80-9e3f-47ee-8a45-f4a570f5ea00, 39461107-981b-4e04-aabc-468d9f7cf7f0, afd88dc6-14ef-4782-b50f-e1db3f442449, a820f4cb-5526-4556-826b-d0c9f5f9881e, f6d08522-9cdf-4c40-8845-ef8655db964e, 292d1824-6688-422f-8cff-5cab708ab3f3, 801eafab-0542-4c6a-842b-956adcab89f4, 0176c790-c2f6-4de9-b231-d1b1e23285ac, 00404802-6de4-4920-8cbd-7471670d4392, 1a12816a-b2c4-4d68-a9bd-f371223176d0, 88f6ae7c-a27a-46ee-b410-e0e7fceb0822, 9f439b06-3f86-4828-8161-e5265d201239, acf230f8-914f-4c53-a061-fc3e1994bf92, 6a2f7f67-2910-46ad-9652-ddcc30a3ba67, 8f20e32d-4d4e-4e72-a3fa-184e0a9bfe5c, 31c74e40-5b60-4b79-a341-975a9bb6832b, 106e41c6-2098-4fa0-8305-5f7f2c99245a, 99cc43a8-8b0a-4ffb-844f-65af399097f2, 84b7e2eb-1bf6-4c01-bd9d-e83628fcb372, 6d794d33-9819-4c90-94bd-83e9546c60f6, c6bafff0-fda3-46f2-936e-11a11da8500d, 292e684f-a0f6-4e3c-ab41-860874800576, 0eab4144-d059-43f5-8a96-9779448a4eae, 4e29d7c7-b6bf-4b6b-949b-78a842ab4c26, f87c6d7d-2eec-4ef9-95d2-7a7bc4e80c98, 1a028fe7-ecfd-43c7-b34c-074275e5de7f, 04f67535-18e2-44bc-8343-8bd154bbe195, aa8f328a-f550-4767-b74d-6f47893d3efe, 91a299f2-0723-46a2-ba10-3116878d119e, e9932b2f-16bf-4e35-a7f2-ed04e2764ca6, 79782fa3-28a5-4d6a-bf5c-528adb5bc4e7, 71f11978-ba06-48bb-87bd-ab0c0baf1e1a, 0be487b0-e2dd-4e60-a2e9-86e831a7b8d3, 97a2ada6-6630-4f3a-87ed-694bb00be07d, 23c67ca3-6aae-4e26-90db-07a849490eb7, 232773fc-d6c4-4b95-ae20-fe2d15cbdd7e, 8c422b5d-a571-41be-b9c0-1bf54efb1ca9, 0bc1f85d-026a-4ef2-bb92-a204f95593e6, d93a4b3f-7739-4e3e-a3f4-f0fa1638c99d, f0f7072c-bd1f-48e9-b5ed-27143cad87e3, cbb05021-3ad9-4639-ae2b-cc89d15359f9, 4d2857dd-1b97-4242-ac78-3901d06959d1, 9336448a-3171-4da6-8d7e-8111aaa86b03, 8046ff06-bb9c-4fc2-ad9d-aa89fcab87e1, 283cc90a-7397-49c9-a668-16664f6eb83f, 6906cbf2-832f-4443-8b16-f8fdbdc454d2, 73134710-f1be-44be-8358-791a01cb5cdc, fcf0fa40-d809-4afa-8f54-1163ff33acab, b51dd2a6-a9c3-4c13-b00d-0185acb3cc11, d8d4680f-61b3-4b38-bb2b-f049a27ef39e, 3cfa5756-08a7-4321-9684-8f40f4f33258, 50641570-753c-49a2-83d9-fc29cf06ab4c, adf8aa38-7cdf-4502-a8de-3632bfeadbc7, 8dbf051d-7a95-4e80-9625-1f8df5036b90, b6775c30-8c93-4dd3-a1b2-c3c6a1fae5fc, 774934cf-4014-4ff5-ad80-ab19467720a9, 0e2e1829-e56c-4e63-9580-27e1db0d5992, e908812a-f8d8-45c7-9149-60df0069b772, a57c75c5-7fe7-4550-8d17-1a291430db5e, 890bfdfe-8564-4ec6-9143-3cfe3a7f3bb7, 4ee13920-8ff7-4f0b-ba87-d98cc47b4420, b5cc63b7-4fc6-43da-b7d4-2ae6087d2c92, fdfe3b9d-eb1c-4d45-998a-344e284e0134, 5f10e7a2-c9ce-4410-91c7-577ca9e688d7, 83420577-99d5-4c12-8332-c46c9d2d8cd4, ad4cafd6-29a3-4905-8a46-8cb62c3aac47, f542e332-18e7-45c7-a647-6819341250c4, ea547246-2597-49e8-abc0-b06c426a4271, 45a4be80-0758-4b15-b331-74582b1c6704, ed17924f-3385-4afe-86a7-73ff06640b75, 70027876-60e7-41b3-9f75-c9329153e8d6, 56a99b2e-3020-4d3b-a8bc-ace60b1a4327, 2da1f24b-e5ba-49e0-a405-46208ee2f32d, bbb33619-df86-435d-bda3-8c1d6a6e5e89, 7c9a0647-bc50-4575-80b3-eec19848fff5, e0c854fa-50ec-41ce-9bf3-614b9adc25ec, 88031eca-e2e7-450a-af85-839822971164, 08e0ca64-1e37-41a0-99d1-120c94ae1c91, 7afa0b5e-07e0-4238-9707-e6361414f065, f1dfa384-00bd-4643-be7d-462395f5e3f6, 37eb69d5-2d19-48c9-97d6-29b27dec299c, 843bde7c-7c54-45ea-aafa-545375fa1763, 78d79f85-c368-47a3-b231-5b0783aea27b, b440bb48-e214-4932-a3f9-160a79417ca3, 65266c5d-0108-4fe0-99a1-f824c470e0b5, 95caeed3-cda4-4f4c-b9b9-2e5c7a97ab6b, 79125b40-2f81-42c4-8383-bc8f012db906, 325676e7-dda7-473e-afa2-d6d3d09e7217, 465844a0-6b4e-41d8-bc63-f7780853619a, b0b12eb4-ffa6-4fb7-ab6f-eeb1f085a66e, 37c6ccd0-14b6-4e7b-a1ef-5b5fde24d1be, 8b7128a2-3f50-4e1f-b92b-fb52a081e408, f33caf20-fd02-4e01-b7e7-138e0518d1b6, aafb541c-e0d0-4345-8d5c-54856f22eec1, d585f3ad-297f-4b97-9d56-58920657636a, aca34e01-3e03-46f3-9a4b-ed653af6416a, 04cb40de-0bd5-4d51-bc15-e6f527894409, 537fea5d-58f6-4d69-97ad-eabbdc06e0f9, a2ebface-5ad6-4f08-ace9-332c3a249636, 168d953c-32e7-452a-b654-7f70ae80dca9, e31ae0d9-a864-4ee1-b462-a81a79236445, 501e4d17-e443-478f-ba42-11ae299ed391, 8befd795-d361-4e1f-bef8-69af8167780a, 8d5c514c-c8a0-4a26-b8e0-0cc79fb1e27e, 47759878-2d1b-4f20-b504-6037fb817034, b8d40ac5-2943-4fff-9c2c-241048a5bf05, 18b3ca8c-c3e1-4e48-994a-e52bc5f4ba2f, 12ee5b1a-c0eb-4a7f-b87c-d229b5c631b9, 0750cc88-0cea-4276-bd86-3616dfb8bb04, 68ca441c-18ad-46ca-a46b-8cf0c6373bc4, beb99eda-275c-452c-a53e-2d35c22b7da3, 6e7b9aef-250e-41dc-afb5-c78d61e19f30, 49d4be0c-887d-458f-b97b-3e1d44783ffd, 8cee109d-f58f-4d1d-8666-15d6b03df745, cc90310c-c8b6-4280-8613-29b9ceaab0fa, a8d1cf92-9e51-4253-a848-de542607e524, 7744604b-f46f-4656-8217-090cd6042603, b5d9a984-2077-4430-9d73-85245e4efca4, a53426ce-4dcb-4278-b6c2-c4d1fffe0b74, f9867bcc-2b1f-4e3e-a75d-a638768df723, 843803c8-9916-4709-bc8d-e029cd7d4604, d2868f4e-7a30-4910-af9c-f2c4eccd399b]: Unexpected: BackendException: Error during search ou=Application Properties,cn=rudder-configuration SUB: A client-side timeout was encountered while waiting 300000ms for a response to search request with message ID 288, base DN 'ou=Application Properties,cn=rudder-configuration', scope SUB, and filter '(objectClass=property)' from server localhost:389.; cause was: com.unboundid.ldap.sdk.LDAPSearchException: A client-side timeout was encountered while waiting 300000ms for a response to search request with message ID 288, base DN 'ou=Application Properties,cn=rudder-configuration', scope SUB, and filter '(objectClass=property)' from server localhost:389. 
 -> com.normation.ldap.sdk.RoLDAPConnection.$anonfun$search$1(LDAPConnection.scala:321)
[2021-02-24 15:39:26+0000] WARN  explain_compliance.2da6b244-019b-4cc4-831e-f0aa0d7c7eba - Received a run at 2021-02-24T15:20:55.000Z for node '2da6b244-019b-4cc4-831e-f0aa0d7c7eba' configId '20200826-084123-9fa3d540' which is not known by Rudder, and that node should be sending reports for configId 20210224-115717-25e62ed
[2021-02-24 15:39:26+0000] WARN  explain_compliance.292d1824-6688-422f-8cff-5cab708ab3f3 - Received a run at 2021-02-24T15:24:19.000Z for node '292d1824-6688-422f-8cff-5cab708ab3f3' configId '20200826-100925-ac5525b9' which is not known by Rudder, and that node should be sending reports for configId 20210224-115717-fc114038
....

Subtasks

Bug #18961: typo in parent ticketReleasedFrançois ARMANDActions

Related issues

Related to Rudder - Bug #18982: on JVM8, there's no OOM logIn progressFrançois ARMANDActions
#1

Updated by François ARMAND about 1 month ago

  • Status changed from New to In progress
#2

Updated by François ARMAND about 1 month ago

The problem seems to be that in the "catch" part, there is a e.printStackTrace() call, which requires memory allocation, and so, fails.

The only correct way to handle OOM error is with the built-in jvm parameter -XX:+CrashOnOutOfMemoryError.

#3

Updated by François ARMAND about 1 month ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Vincent MEMBRÉ
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/2431
#4

Updated by François ARMAND about 1 month ago

  • Status changed from Pending technical review to Pending release
#5

Updated by Nicolas CHARLES about 1 month ago

on my test system, it stopped directly rudder, it didn't even have a chance to log the error

#6

Updated by Nicolas CHARLES about 1 month ago

journalctl helpfully told me

Mar 05 10:15:05 server su[898]: pam_unix(su:session): session opened for user root by vagrant(uid=0)
Mar 05 10:15:05 server systemd-coredump[773]: Process 22824 (java) of user 0 dumped core.

                                              Stack trace of thread 16893:
                                              #0  0x00007f6b7b0b293f raise (libc.so.6)
                                              #1  0x00007f6b7b09cc95 abort (libc.so.6)
                                              #2  0x00007f6b7a21e9df _ZN2os5abortEb.cold.54 (libjvm.so)
                                              #3  0x00007f6b7ab91163 _ZN7VMError14report_and_dieEv (libjvm.so)
                                              #4  0x00007f6b7a4fbdf3 _Z25report_java_out_of_memoryPKc (libjvm.so)
                                              #5  0x00007f6b7ab53948 _ZN14TypeArrayKlass15allocate_commonEibP6Thread (libjvm.so)
                                              #6  0x00007f6b7aa04720 _ZN11OptoRuntime11new_array_CEP5KlassiP10JavaThread (libjvm.so)
                                              #7  0x00007f6b6506d107 n/a (n/a)
<pre>

#7

Updated by Nicolas CHARLES about 1 month ago

i do have a /opt/rudder/etc/rudder-jetty-base/hs_err_pid22824.log , but that's not so easy to spot

#9

Updated by François ARMAND about 1 month ago

  • Description updated (diff)
#10

Updated by Vincent MEMBRÉ 24 days ago

This bug has been fixed in Rudder 6.1.10 and 6.2.3 which were released today.

#11

Updated by Vincent MEMBRÉ 24 days ago

  • Related to Bug #18982: on JVM8, there's no OOM log added

Also available in: Atom PDF