Project

General

Profile

Actions

User story #2661

closed

Add a promise in common to prevent from too many cf-execd

Added by Nicolas PERRON almost 12 years ago. Updated over 11 years ago.

Status:
Released
Priority:
2
Assignee:
Matthieu CERDA
Category:
System techniques
Target version:
UX impact:
Suggestion strength:
User visibility:
Effort required:
Name check:
Fix check:
Regression:

Description

In order to prevent a lot of cf-execd process (like #2654) or other process, we should add a new promise in common which check number of processes.


Related issues 1 (0 open1 closed)

Related to Rudder - Bug #2654: Loads of cf-execd leads to OOM killerReleased2012-07-15Actions
Actions #1

Updated by Matthieu CERDA almost 12 years ago

  • Status changed from New to In progress
  • Assignee changed from Jonathan CLARKE to Matthieu CERDA

I'm taking this :D

Actions #2

Updated by Matthieu CERDA almost 12 years ago

  • Target version changed from 47 to 50
Actions #3

Updated by Matthieu CERDA almost 12 years ago

  • Status changed from In progress to Pending technical review
  • % Done changed from 0 to 100

Applied in changeset commit:829bd037cb1295585e5bccdc585c28f099814113.

Actions #4

Updated by Nicolas PERRON almost 12 years ago

It seems OK to me but I have some questions:
  • Why did you modified from "escaped_workdir" to "g.escaped_workdir" ?
  • Shouldn't we return a success if the number of processes are OK ?
Actions #5

Updated by Matthieu CERDA almost 12 years ago

Nicolas PERRON wrote:

It seems OK to me but I have some questions:
  • Why did you modified from "escaped_workdir" to "g.escaped_workdir" ?

Because the original declaration was working by miracle, and a small correction to make sure things work right is worth nothing. It is trivial, really.

  • Shouldn't we return a success if the number of processes are OK ?

Well, I did not want to declare a new section in the metadata.xml as it is what we already did for some bundles. But it does not really make a difference to me, so if you think that an extra report is needed, I will add it gladly.

Actions #6

Updated by Matthieu CERDA almost 12 years ago

  • Status changed from Pending technical review to Discussion
  • Assignee changed from Matthieu CERDA to Nicolas PERRON
Actions #7

Updated by Matthieu CERDA almost 12 years ago

  • Assignee changed from Nicolas PERRON to Jonathan CLARKE

<message deleted because it was unrelated to this issue>

Actions #8

Updated by Matthieu CERDA almost 12 years ago

  • Assignee changed from Jonathan CLARKE to Nicolas PERRON
Actions #9

Updated by Jonathan CLARKE almost 12 years ago

  • Assignee changed from Nicolas PERRON to Matthieu CERDA
  • % Done changed from 100 to 70

Matthieu CERDA wrote:

Nicolas PERRON wrote:

It seems OK to me but I have some questions:
  • Why did you modified from "escaped_workdir" to "g.escaped_workdir" ?

Because the original declaration was working by miracle, and a small correction to make sure things work right is worth nothing. It is trivial, really.

No, this is not trivial. It is a full blown bug, that should be reported as such, and affects all Windows machines. Please revert this change in this bug, and open a new ticket to trace it. However, it is well spotted, thanks!

  • Shouldn't we return a success if the number of processes are OK ?

Well, I did not want to declare a new section in the metadata.xml as it is what we already did for some bundles. But it does not really make a difference to me, so if you think that an extra report is needed, I will add it gladly.

I agree, we should have a new section for reporting on this. Also, your result_error should be a result_repaired, since you killed the excess cf-execd(s), so you repaired the problem. It would good to have a success report as well, for the case that everything is fine.

Finally, why did you redefine a body "body process_count system_check_process_count" while there is an almost identical one in the stdlib called "body process_count check_range"? We should use stdlib definitions as far as possible.

Actions #10

Updated by Jonathan CLARKE almost 12 years ago

  • Target version changed from 50 to 2.4.0~beta3
Actions #11

Updated by Nicolas CHARLES almost 12 years ago

Moreover, using the if_else result in having the class execd_is_tame defined if there's been a repair (killing all the processes)

And why are you sending term and kill, rather than using the process_stop to send a command to shut down more gracefully the executor ?

Actions #12

Updated by Matthieu CERDA almost 12 years ago

  • Status changed from Discussion to In progress

As seen with NCH and JCL, we will rather define two steps instead of term/kill'ing the execd processes when 3 execd are detected: sigterm will be sent if there are 3 execd, and after 5 execd we will send sigkills.

Also, I will adjust the reporting.

Actions #13

Updated by Matthieu CERDA almost 12 years ago

  • Status changed from In progress to Pending technical review
  • % Done changed from 70 to 100

Applied in changeset commit:089d31e6351d7c30ba5ed78d4bb37f28d1391665.

Actions #14

Updated by Matthieu CERDA almost 12 years ago

Strange, Redmine sees the commit, but outputs a broken link...

Actions #15

Updated by Nicolas CHARLES over 11 years ago

  • Status changed from Pending technical review to 10

This looks valid, thank you Matthieu

Actions #16

Updated by Jonathan CLARKE over 11 years ago

  • Status changed from 10 to Released
Actions

Also available in: Atom PDF