Project

General

Profile

Actions

Bug #5461

closed

Slapd base corruption leads to slow app (for ex node deletion is terribly slow)

Added by Lionel Le Folgoc almost 10 years ago. Updated about 7 years ago.

Status:
Rejected
Priority:
3
Assignee:
-
Category:
Performance and scalability
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Priority:
0
Name check:
Fix check:
Regression:

Description

Hi,

Rudder 2.11.2, distributed setup.

My instance had 1000 nodes, I wrote a bash script to delete all nodes through the API.
In 15 hours, 600 nodes were deleted... that's less than 1 node per minute... :(

Moreover, slapd log are full (several time per second) of:

Sep 30 08:28:25 rudderldap01 rudder-slapd1248: connection_input: conn=1004 deferring operation: too many executing

Finally, reinitialising slapd server make things go fast again.

Actions #1

Updated by Matthieu CERDA almost 10 years ago

  • Project changed from 24 to Rudder
  • Category set to Web - Nodes & inventories
  • Status changed from New to 8
  • Assignee set to François ARMAND
  • Priority changed from N/A to 1
  • Target version set to 2.11.3

Ouch. That's indeed very slow...

Can you take a look at this, François ? Thanks :)

Actions #2

Updated by Matthieu CERDA almost 10 years ago

  • Category changed from Web - Nodes & inventories to Performance and scalability
Actions #3

Updated by François ARMAND almost 10 years ago

This is indeed terribly slow, and I will look to that.

But it seems that the major problem is the missing api call "bulk delete node1, node2, .... node5625".
I susepct that that some part of the latency is due to 1/ network, 2/generating rules, and neither have to be paid each time.

Actions #4

Updated by François ARMAND almost 10 years ago

  • Status changed from 8 to In progress
Actions #5

Updated by Lionel Le Folgoc almost 10 years ago

But it seems that the major problem is the missing api call "bulk delete node1, node2, .... node5625".

(that would be nice ; then again, I don't think I'm going to perform mass bulk deletion of my nodes everyday ;-)

I susepct that that some part of the latency is due to 1/ network, 2/generating rules, and neither have to be paid each time.

1/ I'm launching the script directly on the policy server.
2/ The whole rule regeneration takes 20 minutes here so that might not be the cause of the 1min delay (my script doesn't force a rule generation anyway).

Actions #6

Updated by François ARMAND almost 10 years ago

I missed the 20 minutes. Wow. That is awful. Which version of Rudder ? Could you please add the line

<logger name="com.normation.rudder.services.policies.DeploymentServiceImpl" level="debug" />

On the logback.xml configuration file to get timing about what is going so slowly ?

Could you also please run the following scrip so that we get some insight about the size of things: https://github.com/fanf/rudder-tools/blob/rudder_server_metric_reporting_script/scripts/rudder_metrics_reporting.sh

Thanks,

Actions #7

Updated by Matthieu CERDA almost 10 years ago

  • Target version changed from 2.11.3 to 2.11.4
Actions #8

Updated by François ARMAND almost 10 years ago

I got some information, not sure if it was for that system, thought:

Number of expected reports (components*directives*nodes): 17848
Number of rules: 6
Number of directives: 6
Number of nodes: 5002
Number of reports for one day: 36993
Report database size: 114 MB
Number of lines in reports table: 108464
Full database size: 2192 MB
Archiving reports:
archive.TTL=3
delete.TTL=90
frequency=daily

With a 4Go (ec2) server.

So, in this regard, I think we are reaching the limit of the system and that is what is causing über slowness. With a little of bad luck the system is swapping, making everything even slower.

The promise generation is taking so long because of the cf-promises growing lineary with the number of nodes. In such installation, #4427 is a showstopper.

So, first thing to do: let user know the server size to use for such big installation. We are recomanding 2Go for the JVM for 500 nodes. Of course, things are not linear, but for 5000 nodes, a multi-server installation with a 8Go server for the webapp machine seems to be a minimum, with 6Go for the JVM. This is a first raw estimate (that could be a big underestimate if some linear or worst memory consumption happen), we will work on better dimensionning metrics.

Actions #9

Updated by Lionel Le Folgoc almost 10 years ago

I discovered later than slapd was wrecked on the server (at some point rudder runtime logs filled the partition, and slapd was in a bizarre state because of that).

I reinitialised everything since then, so when I'm done with the current tests, I'll try again a deletion. I think it'll be significantly faster.

Actions #10

Updated by Lionel Le Folgoc almost 10 years ago

Okay, I retried with 5000 accepted nodes, and deleting one node through the API took less than 4 seconds, so I guess slapd was in a weird state and was the cause of the slowdown.
Thanks, feel free to close this issue.

Actions #11

Updated by François ARMAND almost 10 years ago

OK, that seems far better!

I think a lot of other monster slowness should have been present when slapd was in that state (unresponsive UI, etc).

So, we have to find a way to automatically know that it is happening.

Actions #12

Updated by Lionel Le Folgoc almost 10 years ago

The slapd log was full of lines similar to this one (several times per second):
Sep 30 08:28:25 rudderldap01 rudder-slapd1248: connection_input: conn=1004 deferring operation: too many executing

Actions #13

Updated by François ARMAND almost 10 years ago

  • Subject changed from Node deletion is terribly slow to Slapd base corruption lead to slow app (for ex node deletion is terribly slow)
  • Description updated (diff)

Quite intersting.

Actions #14

Updated by François ARMAND almost 10 years ago

  • Status changed from In progress to 8
  • Priority changed from 1 to 3
Actions #15

Updated by François ARMAND almost 10 years ago

  • Subject changed from Slapd base corruption lead to slow app (for ex node deletion is terribly slow) to Slapd base corruption leads to slow app (for ex node deletion is terribly slow)
Actions #16

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.4 to 2.11.5
Actions #17

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.5 to 2.11.6
Actions #18

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.6 to 2.11.7
Actions #19

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.7 to 2.11.8
Actions #20

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.8 to 2.11.9
Actions #21

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.9 to 2.11.10
Actions #22

Updated by Benoît PECCATTE over 9 years ago

  • Status changed from 8 to New
Actions #23

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.10 to 2.11.11
Actions #24

Updated by Vincent MEMBRÉ about 9 years ago

  • Target version changed from 2.11.11 to 2.11.12
Actions #25

Updated by Vincent MEMBRÉ about 9 years ago

  • Target version changed from 2.11.12 to 2.11.13
Actions #26

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 2.11.13 to 2.11.14
Actions #27

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 2.11.14 to 2.11.15
Actions #28

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 2.11.15 to 2.11.16
Actions #29

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.16 to 2.11.17
Actions #30

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.17 to 2.11.18
Actions #31

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.18 to 2.11.19
Actions #32

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.19 to 2.11.20
Actions #33

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.20 to 2.11.21
Actions #34

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 2.11.21 to 2.11.22
Actions #35

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 2.11.22 to 2.11.23
Actions #36

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 2.11.23 to 2.11.24
Actions #37

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 2.11.24 to 308
Actions #38

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 308 to 3.1.14
Actions #39

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 3.1.14 to 3.1.15
Actions #40

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 3.1.15 to 3.1.16
Actions #41

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 3.1.16 to 3.1.17
Actions #42

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 3.1.17 to 3.1.18
Actions #43

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 3.1.18 to 3.1.19
Actions #44

Updated by François ARMAND over 7 years ago

  • Severity set to Major - prevents use of part of Rudder | no simple workaround
  • User visibility set to Operational - other Techniques | Technique editor | Rudder settings
  • Priority set to 0
Actions #45

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 3.1.19 to 3.1.20
Actions #46

Updated by Jonathan CLARKE about 7 years ago

  • Assignee deleted (François ARMAND)
Actions #47

Updated by Vincent MEMBRÉ about 7 years ago

  • Target version changed from 3.1.20 to 3.1.21
Actions #48

Updated by Vincent MEMBRÉ about 7 years ago

  • Target version changed from 3.1.21 to 3.1.22
Actions #49

Updated by Benoît PECCATTE about 7 years ago

  • Status changed from New to Rejected

The problem seems solved.

Actions

Also available in: Atom PDF