Bug #7295
closedslapd core dumps on 1TB RAM
Description
When starting slapd on a system with 1TB of RAM, it writes a coredump due a memory allocation issue.
OS: SLES11 SP3
Package: rudder-inventory-ldap-2.11.15.release-1.SLES.11
Memory: 1033740 Megabytes
++ sed -n '/^MemTotal/s/[^0-9]//gp' /proc/meminfo + MEMSIZE=1058549824 + CACHESIZE=135490000 + sed -ie '/^[ \t]*\(cachesize\|idlcachesize\)/d' /opt/rudder/etc/openldap/slapd.conf + sed -ie 's/^\([ \t]*suffix[ \t]\+"cn=rudder-configuration".*\)/\1\ncachesize 135490000/' /opt/rudder/etc/openldap/slapd.conf + sed -ie 's/^\([ \t]*suffix[ \t]\+"cn=rudder-configuration".*\)/\1\nidlcachesize 406470000/' /opt/rudder/etc/openldap/slapd.conf [...] 56262153 ch_calloc of 1 elems of 18446744010302859784 bytes failed slapd: ch_malloc.c:107: ch_calloc: Assertion `0' failed. /etc/init.d/rudder-slapd: line 359: 58757 Aborted (core dumped) $SLAPD_BIN -h "$SLAPD_SERVICES" $SLAPD_PARAMS -d 1
Here is the backtrace from the core:
gdb -batch -ex bt /opt/rudder/libexec/slapd core.slapd.58757 bt [...] Core was generated by `/opt/rudder/libexec/slapd -h ldap://127.0.0.1:389 -n rudder-slapd -f /opt/rudde'. Program terminated with signal 6, Aborted. #0 0x00007fb98f48a885 in raise () from /lib64/libc.so.6 #0 0x00007fb98f48a885 in raise () from /lib64/libc.so.6 #1 0x00007fb98f48be61 in abort () from /lib64/libc.so.6 #2 0x00007fb98f483740 in __assert_fail () from /lib64/libc.so.6 #3 0x000000000044fc2c in ch_calloc () #4 0x000000000043dfa6 in attr_prealloc () #5 0x00007fb98cf551df in hdb_db_open (be=0x893400, cr=0x7ffd055011f0) at init.c:536 #6 0x0000000000443f7f in backend_startup_one () #7 0x000000000044428b in backend_startup () #8 0x000000000041a850 in main ()
When starting the slapd with RUDDER_CACHESIZE="noauto" and removing the cachesize-tunings from the slapd config, then it starts.
Doing an strace on the slapd WITH memory tuning:
21745 mmap(NULL, 35921920, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f1a4673b000 21745 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f1a46733000 21745 mmap(NULL, 3145728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1a46433000 21745 mmap(NULL, 10839203840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17c0621000 21745 mmap(NULL, 18446744010302861312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) 21745 mmap(NULL, 18446744010302996480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) 21745 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f17b8621000 21745 mmap(NULL, 18446744010302861312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) 21745 --- SIGABRT (Aborted) @ 0 (0) --- 21745 +++ killed by SIGABRT (core dumped) +++
When removing all memory tuning, it shows smaller values:
29764 mmap(NULL, 35921920, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f4a511a3000 29764 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f4a57813000 29764 mmap(NULL, 24576, PROT_READ|PROT_WRITE, MAP_SHARED, 10, 0) = 0x7f4a578d8000 29764 mmap(NULL, 66019328, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f4a4f4ef000 29764 mmap(NULL, 1073741824, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f4a0f4ef000 29764 mmap(NULL, 2359296, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f4a0f2af000 29764 mmap(NULL, 35921920, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f4a0d06d000 29764 mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7f4a0d065000 29764 mmap(NULL, 3145728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4a0cd65000 29764 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f4a0c864000 29765 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f4a0c063000
Somehow the calculated memory size does somehow not fit for slapd on a system that big, as it tries to allocate 18.4 exabytes (10^10) of RAM :-D :-D :-D
Updated by Janos Mattyasovszky about 9 years ago
- Subject changed from slapd core dumps due malloc issue to slapd core dumps on 1TB RAM
- Description updated (diff)
Updated by Vincent MEMBRÉ about 9 years ago
- Assignee set to Benoît PECCATTE
Wow ! Don't know if it's a bug in slapd or not!
What do you think of it Benoit ?
Updated by Jonathan CLARKE about 9 years ago
The fact that it works OK with "noauto" indicates this is a bug in our memory calculation. We do this to dynamically adjust the database size for openldap, and obviously if we tell it to use 18.4 exabytes (!!) it will fail.
Now all I need to fix this is a machine with 1 TB of ram :D
Updated by Benoît PECCATTE about 9 years ago
Well, we do not tell it to use 18.4 exabytes, we tell it to use 135490000 cache entries
With a 800 bytes mean cache entry size, this makes 135490000*800/1024/1024/1024 = 100Gb of RAM.
It's a lot (probably too much) but not the size ldap tries to allocate (126Gb per cache entry).
This looks like an integer overflow in openldap (18*10^18 ~ 2^64).
Since 18446744010302859784-2^64 = -60G, I'd say that it overflows at around 40Gb of cache.
So I suggest to limit ldap cache to 32Gb to work around this.
Updated by Janos Mattyasovszky about 9 years ago
If you have a patch, I can test it if you'd like.
However, if the limit will be max 32G, it should start on an arbitrary big system having about 64G+ of RAM.
Updated by Jonathan CLARKE about 9 years ago
Janos Mattyasovszky wrote:
If you have a patch, I can test it if you'd like.
However, if the limit will be max 32G, it should start on an arbitrary big system having about 64G+ of RAM.
The calculation is already proportional to the RAM size, and aims to use about 10%, thus the 100 GB.
We can fix a limit way before that though, because that would be a hugh cache (about 40M entries!)
Updated by Jonathan CLARKE about 9 years ago
- Related to Bug #5965: LDAP configuration is not optimized for Rudder use case added
Updated by Jonathan CLARKE about 9 years ago
- Status changed from New to In progress
- Assignee changed from Benoît PECCATTE to Jonathan CLARKE
Updated by Jonathan CLARKE about 9 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Jonathan CLARKE to Benoît PECCATTE
- Pull Request set to https://github.com/Normation/rudder-packages/pull/813
Updated by Jonathan CLARKE about 9 years ago
Janos Mattyasovszky wrote:
If you have a patch, I can test it if you'd like.
Thanks! Could you test this patch and let us know? https://patch-diff.githubusercontent.com/raw/Normation/rudder-packages/pull/813.patch
Updated by Jonathan CLARKE about 9 years ago
- Status changed from Pending technical review to Pending release
- % Done changed from 0 to 100
Applied in changeset rudder-packages|239789f041d1fd48066f277566c4525461dc2abb.
Updated by Benoît PECCATTE about 9 years ago
Applied in changeset rudder-packages|719a37c7c87a8c5971e79747f59240a8c92418f9.
Updated by Vincent MEMBRÉ about 9 years ago
- Status changed from Pending release to Released