Bug #18955
closedFatal exception doesn't cause rudder to stop anymore
Description
RESOLUTION NOTE: the provided patch use the OpenJDK boot parameter "CrashOnOutOfMemoryError" that is only available since OpenJDK 1.8.0_92. You should really use a more recent OpenJDL than that for performance and security reasons, but if you are stuck with a 6 years old runtime, you can modify the file /opt/rudder/etc/rudder-jetty.conf
and remove it.
In #14281 we added an handler for fatal exception so that they force rudder to stop (because from that point, rudder is broken in unknown way).
But now, rudder doesn't stop on them:
[2021-02-24T15:27:59.413Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions [2021-02-24T15:28:40.190Z] ERROR FATAL exception in thread 'pool-2-thread-14' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space' java.lang.OutOfMemoryError: Java heap space [2021-02-24T15:29:01.906Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions [2021-02-24T15:30:02.031Z] ERROR FATAL exception in thread 'JettyShutdownThread' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space' java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space [2021-02-24T15:32:26.929Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions [2021-02-24T15:34:25.738Z] ERROR FATAL exception in thread 'Health Check Thread for LDAPConnectionPool(serverSet=SingleServerSet(server=localhost:389, includesAuthentication=false, includesPostConnectProcessing=false), maxConnections=2)' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space' java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space [2021-02-24T15:32:21.339Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions [2021-02-24T15:34:25.737Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions [2021-02-24T15:34:25.748Z] ERROR FATAL exception in thread 'zio-default-blocking-42' (in threadgroup 'zio-default-blocking'): 'java.lang.OutOfMemoryError': 'Java heap space' java.lang.OutOfMemoryError: Java heap space [2021-02-24 15:34:25+0000] ERROR net.liftweb.actor.ActorLogger - Actor threw an exception java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space [2021-02-24T15:34:25.747Z] ERROR FATAL exception in thread 'Connection reader for connection 141 to localhost:389' (in threadgroup 'zio-default-blocking'): 'java.lang.OutOfMemoryError': 'Java heap space' java.lang.OutOfMemoryError: Java heap space [2021-02-24T15:34:25.766Z] ERROR FATAL Rudder JVM caught an unhandled fatal exception. Rudder will now stop to prevent further inconsistant behavior. This is likely a bug, please contact Rudder developers. You can configure the list of fatal exception in /opt/rudder/etc/rudder-web.properties -> rudder.jvm.fatal.exceptions [2021-02-24T15:34:25.766Z] ERROR FATAL exception in thread 'pool-2-thread-12' (in threadgroup 'main'): 'java.lang.OutOfMemoryError': 'Java heap space' java.lang.OutOfMemoryError: Java heap space [2021-02-24 15:32:21+0000] WARN com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=1m37s705ms416µs741ns). [2021-02-24 15:34:55+0000] INFO inventory-processing - Report 'era-afd88dc6-14ef-4782-b50f-e1db3f442449.ocs' for node 'era.ad.cullinanstudio.com' [afd88dc6-14ef-4782-b50f-e1db3f442449] (signature:certified) processed in 11 minutes, 1 second and 965 milliseconds ms [2021-02-24 15:34:55+0000] WARN com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=5m37s929ms914µs588ns). [2021-02-24 15:34:56+0000] INFO inventory-processing - Report 'cs-lxd-843bde7c-7c54-45ea-aafa-545375fa1763.ocs' for node 'cs-lxd.ad.cullinanstudio.com' [843bde7c-7c54-45ea-aafa-545375fa1763] (signature:certified) processed in 1 second and 263 milliseconds ms [2021-02-24 15:39:26+0000] ERROR report.cache - Error when updating compliance cache for nodes: [83df0302-c646-48d4-960c-206eb7d6e76b, 6a7065bd-2857-44a5-a793-299aaf73f877, 64a9fb85-eb95-4e77-b473-bb36d6c755ce, 87257904-cd38-4f26-8bb7-86a95b616b31, 2f8a8a3e-e61e-493f-845b-411efd5e5cc1, eb746532-5c6d-41e9-a2cd-9cb5c359f754, dc4e6ebb-facd-4782-92e7-507c389c1d55, 98524e4d-bbe9-4cc5-a73a-6221ad316000, bc1f48e7-40ed-424f-864e-a36b0f600823, d4632740-3096-4a0f-a960-c58edcfe153f, 3fa39ba8-6134-4d15-a19c-47757443a3e6, ca75c6a1-3dcf-4f1c-a940-458d41955731, b01da0ae-a85e-43c6-90e2-0a97f6b962a9, 431d6b5b-3a63-491b-a545-dcd00149e74a, 0164debd-801f-4dbf-8286-b91f961db385, 012a5fa4-5cae-4a0f-aff0-8cef73605914, 68bfec3d-3148-48da-8936-00d6c6353bb8, 9d7a119e-dae5-401e-bd59-955189442d6e, 801ffa9b-da14-4f3a-9d79-7e25fff83a8b, 68eac959-2a13-48ff-9743-fb189da3751d, dd5da446-4465-4412-898d-d5fad325b057, 29bcf98b-7a98-4fd9-9c65-ca6f558523a1, 72d7d23a-49c3-40a9-b19e-ccca54f977b4, 1c4f8f91-8acf-4041-9182-15b47162c91f, ea1c1a5e-b035-4b30-af11-343d1ca80ac6, 16312315-eaf7-48f2-a738-2a8cc1a18277, 153b1a15-d18a-4382-99c5-cfe28c32040c, e6c5b022-531d-4ac7-ab42-074401be8b29, root, 6abcaf43-a53f-495c-92d9-092ae732f8b7, ebf278c9-18ae-4dc3-a860-0d4db34c0bed, 8651f72c-e34b-4fe9-9fc1-299faeb4f624, a32f001d-1eaf-41ad-bf32-e548e0f796b5, fdd9476f-7b5e-44f1-a0ad-71578250eacc, 900ecf39-cfd5-44e0-9d1d-1d539694fd05, 510bda45-631a-4708-a98e-5db91ee31f49, 7e9d00c1-949c-4666-b102-1bb3d7cb9d87, 193e9378-731b-4246-9a69-251b05f537c8, fff40e4d-694f-485e-b2c0-bacc43b8b482, 020dfd93-295d-408f-b934-34cee86d882f, 3ce75c4f-b041-4c15-877b-e3d441d83bd8, ad063ee1-6a7e-4edc-a028-f36b76cc17ea, bad452a7-36e7-4956-b29b-d2f58d2ad547, e27a8c6e-c5f2-480f-a7c7-05b1e271e078, 4b7b1e60-ee4d-44db-9409-f12b9f9de751, 0f18bc42-e1cc-4ee4-a0ef-b376eef08806, ef503e99-b19d-4a4a-aadc-fd57c64fcfa5, e2f589ae-d1c1-4a99-b67c-57d439f0efed, 2da6b244-019b-4cc4-831e-f0aa0d7c7eba, eac1ddac-df1a-4f18-8760-053a29b3ec57, f80fb2df-107a-4b1c-a1a7-af935b3d6dd2, 93b2bf80-9e3f-47ee-8a45-f4a570f5ea00, 39461107-981b-4e04-aabc-468d9f7cf7f0, afd88dc6-14ef-4782-b50f-e1db3f442449, a820f4cb-5526-4556-826b-d0c9f5f9881e, f6d08522-9cdf-4c40-8845-ef8655db964e, 292d1824-6688-422f-8cff-5cab708ab3f3, 801eafab-0542-4c6a-842b-956adcab89f4, 0176c790-c2f6-4de9-b231-d1b1e23285ac, 00404802-6de4-4920-8cbd-7471670d4392, 1a12816a-b2c4-4d68-a9bd-f371223176d0, 88f6ae7c-a27a-46ee-b410-e0e7fceb0822, 9f439b06-3f86-4828-8161-e5265d201239, acf230f8-914f-4c53-a061-fc3e1994bf92, 6a2f7f67-2910-46ad-9652-ddcc30a3ba67, 8f20e32d-4d4e-4e72-a3fa-184e0a9bfe5c, 31c74e40-5b60-4b79-a341-975a9bb6832b, 106e41c6-2098-4fa0-8305-5f7f2c99245a, 99cc43a8-8b0a-4ffb-844f-65af399097f2, 84b7e2eb-1bf6-4c01-bd9d-e83628fcb372, 6d794d33-9819-4c90-94bd-83e9546c60f6, c6bafff0-fda3-46f2-936e-11a11da8500d, 292e684f-a0f6-4e3c-ab41-860874800576, 0eab4144-d059-43f5-8a96-9779448a4eae, 4e29d7c7-b6bf-4b6b-949b-78a842ab4c26, f87c6d7d-2eec-4ef9-95d2-7a7bc4e80c98, 1a028fe7-ecfd-43c7-b34c-074275e5de7f, 04f67535-18e2-44bc-8343-8bd154bbe195, aa8f328a-f550-4767-b74d-6f47893d3efe, 91a299f2-0723-46a2-ba10-3116878d119e, e9932b2f-16bf-4e35-a7f2-ed04e2764ca6, 79782fa3-28a5-4d6a-bf5c-528adb5bc4e7, 71f11978-ba06-48bb-87bd-ab0c0baf1e1a, 0be487b0-e2dd-4e60-a2e9-86e831a7b8d3, 97a2ada6-6630-4f3a-87ed-694bb00be07d, 23c67ca3-6aae-4e26-90db-07a849490eb7, 232773fc-d6c4-4b95-ae20-fe2d15cbdd7e, 8c422b5d-a571-41be-b9c0-1bf54efb1ca9, 0bc1f85d-026a-4ef2-bb92-a204f95593e6, d93a4b3f-7739-4e3e-a3f4-f0fa1638c99d, f0f7072c-bd1f-48e9-b5ed-27143cad87e3, cbb05021-3ad9-4639-ae2b-cc89d15359f9, 4d2857dd-1b97-4242-ac78-3901d06959d1, 9336448a-3171-4da6-8d7e-8111aaa86b03, 8046ff06-bb9c-4fc2-ad9d-aa89fcab87e1, 283cc90a-7397-49c9-a668-16664f6eb83f, 6906cbf2-832f-4443-8b16-f8fdbdc454d2, 73134710-f1be-44be-8358-791a01cb5cdc, fcf0fa40-d809-4afa-8f54-1163ff33acab, b51dd2a6-a9c3-4c13-b00d-0185acb3cc11, d8d4680f-61b3-4b38-bb2b-f049a27ef39e, 3cfa5756-08a7-4321-9684-8f40f4f33258, 50641570-753c-49a2-83d9-fc29cf06ab4c, adf8aa38-7cdf-4502-a8de-3632bfeadbc7, 8dbf051d-7a95-4e80-9625-1f8df5036b90, b6775c30-8c93-4dd3-a1b2-c3c6a1fae5fc, 774934cf-4014-4ff5-ad80-ab19467720a9, 0e2e1829-e56c-4e63-9580-27e1db0d5992, e908812a-f8d8-45c7-9149-60df0069b772, a57c75c5-7fe7-4550-8d17-1a291430db5e, 890bfdfe-8564-4ec6-9143-3cfe3a7f3bb7, 4ee13920-8ff7-4f0b-ba87-d98cc47b4420, b5cc63b7-4fc6-43da-b7d4-2ae6087d2c92, fdfe3b9d-eb1c-4d45-998a-344e284e0134, 5f10e7a2-c9ce-4410-91c7-577ca9e688d7, 83420577-99d5-4c12-8332-c46c9d2d8cd4, ad4cafd6-29a3-4905-8a46-8cb62c3aac47, f542e332-18e7-45c7-a647-6819341250c4, ea547246-2597-49e8-abc0-b06c426a4271, 45a4be80-0758-4b15-b331-74582b1c6704, ed17924f-3385-4afe-86a7-73ff06640b75, 70027876-60e7-41b3-9f75-c9329153e8d6, 56a99b2e-3020-4d3b-a8bc-ace60b1a4327, 2da1f24b-e5ba-49e0-a405-46208ee2f32d, bbb33619-df86-435d-bda3-8c1d6a6e5e89, 7c9a0647-bc50-4575-80b3-eec19848fff5, e0c854fa-50ec-41ce-9bf3-614b9adc25ec, 88031eca-e2e7-450a-af85-839822971164, 08e0ca64-1e37-41a0-99d1-120c94ae1c91, 7afa0b5e-07e0-4238-9707-e6361414f065, f1dfa384-00bd-4643-be7d-462395f5e3f6, 37eb69d5-2d19-48c9-97d6-29b27dec299c, 843bde7c-7c54-45ea-aafa-545375fa1763, 78d79f85-c368-47a3-b231-5b0783aea27b, b440bb48-e214-4932-a3f9-160a79417ca3, 65266c5d-0108-4fe0-99a1-f824c470e0b5, 95caeed3-cda4-4f4c-b9b9-2e5c7a97ab6b, 79125b40-2f81-42c4-8383-bc8f012db906, 325676e7-dda7-473e-afa2-d6d3d09e7217, 465844a0-6b4e-41d8-bc63-f7780853619a, b0b12eb4-ffa6-4fb7-ab6f-eeb1f085a66e, 37c6ccd0-14b6-4e7b-a1ef-5b5fde24d1be, 8b7128a2-3f50-4e1f-b92b-fb52a081e408, f33caf20-fd02-4e01-b7e7-138e0518d1b6, aafb541c-e0d0-4345-8d5c-54856f22eec1, d585f3ad-297f-4b97-9d56-58920657636a, aca34e01-3e03-46f3-9a4b-ed653af6416a, 04cb40de-0bd5-4d51-bc15-e6f527894409, 537fea5d-58f6-4d69-97ad-eabbdc06e0f9, a2ebface-5ad6-4f08-ace9-332c3a249636, 168d953c-32e7-452a-b654-7f70ae80dca9, e31ae0d9-a864-4ee1-b462-a81a79236445, 501e4d17-e443-478f-ba42-11ae299ed391, 8befd795-d361-4e1f-bef8-69af8167780a, 8d5c514c-c8a0-4a26-b8e0-0cc79fb1e27e, 47759878-2d1b-4f20-b504-6037fb817034, b8d40ac5-2943-4fff-9c2c-241048a5bf05, 18b3ca8c-c3e1-4e48-994a-e52bc5f4ba2f, 12ee5b1a-c0eb-4a7f-b87c-d229b5c631b9, 0750cc88-0cea-4276-bd86-3616dfb8bb04, 68ca441c-18ad-46ca-a46b-8cf0c6373bc4, beb99eda-275c-452c-a53e-2d35c22b7da3, 6e7b9aef-250e-41dc-afb5-c78d61e19f30, 49d4be0c-887d-458f-b97b-3e1d44783ffd, 8cee109d-f58f-4d1d-8666-15d6b03df745, cc90310c-c8b6-4280-8613-29b9ceaab0fa, a8d1cf92-9e51-4253-a848-de542607e524, 7744604b-f46f-4656-8217-090cd6042603, b5d9a984-2077-4430-9d73-85245e4efca4, a53426ce-4dcb-4278-b6c2-c4d1fffe0b74, f9867bcc-2b1f-4e3e-a75d-a638768df723, 843803c8-9916-4709-bc8d-e029cd7d4604, d2868f4e-7a30-4910-af9c-f2c4eccd399b]: Unexpected: BackendException: Error during search ou=Application Properties,cn=rudder-configuration SUB: A client-side timeout was encountered while waiting 300000ms for a response to search request with message ID 288, base DN 'ou=Application Properties,cn=rudder-configuration', scope SUB, and filter '(objectClass=property)' from server localhost:389.; cause was: com.unboundid.ldap.sdk.LDAPSearchException: A client-side timeout was encountered while waiting 300000ms for a response to search request with message ID 288, base DN 'ou=Application Properties,cn=rudder-configuration', scope SUB, and filter '(objectClass=property)' from server localhost:389. -> com.normation.ldap.sdk.RoLDAPConnection.$anonfun$search$1(LDAPConnection.scala:321) [2021-02-24 15:39:26+0000] WARN explain_compliance.2da6b244-019b-4cc4-831e-f0aa0d7c7eba - Received a run at 2021-02-24T15:20:55.000Z for node '2da6b244-019b-4cc4-831e-f0aa0d7c7eba' configId '20200826-084123-9fa3d540' which is not known by Rudder, and that node should be sending reports for configId 20210224-115717-25e62ed [2021-02-24 15:39:26+0000] WARN explain_compliance.292d1824-6688-422f-8cff-5cab708ab3f3 - Received a run at 2021-02-24T15:24:19.000Z for node '292d1824-6688-422f-8cff-5cab708ab3f3' configId '20200826-100925-ac5525b9' which is not known by Rudder, and that node should be sending reports for configId 20210224-115717-fc114038 ....
Updated by François ARMAND over 3 years ago
- Status changed from New to In progress
Updated by François ARMAND over 3 years ago
The problem seems to be that in the "catch" part, there is a e.printStackTrace()
call, which requires memory allocation, and so, fails.
The only correct way to handle OOM error is with the built-in jvm parameter -XX:+CrashOnOutOfMemoryError
.
Updated by François ARMAND over 3 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from François ARMAND to Vincent MEMBRÉ
- Pull Request set to https://github.com/Normation/rudder-packages/pull/2431
Updated by François ARMAND over 3 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder-packages|3da22c42ca70d404da2468f03092324ec83767e5.
Updated by Nicolas CHARLES over 3 years ago
on my test system, it stopped directly rudder, it didn't even have a chance to log the error
Updated by Nicolas CHARLES over 3 years ago
journalctl helpfully told me
Mar 05 10:15:05 server su[898]: pam_unix(su:session): session opened for user root by vagrant(uid=0) Mar 05 10:15:05 server systemd-coredump[773]: Process 22824 (java) of user 0 dumped core. Stack trace of thread 16893: #0 0x00007f6b7b0b293f raise (libc.so.6) #1 0x00007f6b7b09cc95 abort (libc.so.6) #2 0x00007f6b7a21e9df _ZN2os5abortEb.cold.54 (libjvm.so) #3 0x00007f6b7ab91163 _ZN7VMError14report_and_dieEv (libjvm.so) #4 0x00007f6b7a4fbdf3 _Z25report_java_out_of_memoryPKc (libjvm.so) #5 0x00007f6b7ab53948 _ZN14TypeArrayKlass15allocate_commonEibP6Thread (libjvm.so) #6 0x00007f6b7aa04720 _ZN11OptoRuntime11new_array_CEP5KlassiP10JavaThread (libjvm.so) #7 0x00007f6b6506d107 n/a (n/a) <pre>
Updated by Nicolas CHARLES over 3 years ago
i do have a /opt/rudder/etc/rudder-jetty-base/hs_err_pid22824.log , but that's not so easy to spot
Updated by Nicolas CHARLES over 3 years ago
- Fix check changed from To do to Error - Next version
Updated by Vincent MEMBRÉ over 3 years ago
This bug has been fixed in Rudder 6.1.10 and 6.2.3 which were released today.
Updated by Vincent MEMBRÉ over 3 years ago
- Related to Bug #18982: on JVM8, there's no OOM log added
Updated by Vincent MEMBRÉ over 3 years ago
This bug has been fixed in Rudder 6.1.10 and 6.2.3 which were released by the end of October 2020.
Updated by Vincent MEMBRÉ over 3 years ago
- Status changed from Pending release to Released