Bug #14506
closedImprove 'rsyslog' to manage larger load of reports
Description
We use DirectQueue in rsyslog, which is quite inefficient for very large load.
Messages can be dropped in case of continuous burst, or sustained high quantity of message, especially on old version of rsyslog
We need to have also several workers to consume these message - but we'll put a concervative 2 workers in this change, for small installation
Some metrics:
Receiving 160 000 message with a DirectQueue of 100 000 elements with rsyslog 8.1901.0 causes it to drop (udp level) 45 000 message, while with LinkedList, only 100 are lost
To correctly handle 1400 messages/s (sustained), it is required to have rsyslog 8.1901.0 at least, with a LinkedList of 100 000 element, enlarged udp buffer ( net.core.rmem_max=26214400 and net.core.rmem_default=26214400 ) and at least 8 CPUs + 2 for PosgreSQL
A large part of the cost comes from the regular expression parsing of the messages received (can reach 200% CPUs just for the parsing) which prevents mesage to enqueue.
Note that the new protocol upcoming in Rudder 5.1 will not suffer from this issue, as syslog won't be used anymore