Project

General

Profile

Actions

Bug #6118

closed

cf-agent execution is killed if cf-execd is restarted (for example on a 5a.m daily restart)

Added by Nicolas CHARLES almost 10 years ago. Updated about 7 years ago.

Status:
Rejected
Priority:
2
Category:
System techniques
Target version:
Severity:
Critical - prevents main use of Rudder | no workaround | data loss | security
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Priority:
63
Name check:
Fix check:
Regression:

Description

We have a promise to restart services at 5am every day. Unfortunately, it does not behave as expected as stopping executor daemon kills apparently the agent, causing it to skip a run (it does show especially with advanced reporting plugin)

In 2.10, it merely truncate the logs in outputs folder, but in 2.11 and 3.0 it skips the runs completely


Related issues 1 (0 open1 closed)

Related to Rudder - Bug #7274: The daily cf-execd and cf-serverd restart should use SRC on AIXReleasedBenoît PECCATTE2015-10-13Actions
Actions #1

Updated by Vincent MEMBRÉ almost 10 years ago

  • Target version changed from 2.11.6 to 2.11.7
Actions #2

Updated by Vincent MEMBRÉ almost 10 years ago

  • Target version changed from 2.11.7 to 2.11.8
Actions #3

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.8 to 2.11.9
Actions #4

Updated by François ARMAND over 9 years ago

  • Status changed from New to Discussion
  • Assignee set to Nicolas CHARLES

Just to be sur I understand the bug here: the problem is not the 5am stop, it's that if we restart cf-execd, then cf-agent is killed (and so, if a run was on going, it is killed). Is it that ?

Actions #5

Updated by Nicolas CHARLES over 9 years ago

  • Assignee changed from Nicolas CHARLES to François ARMAND

The problem from the user point of view is that every day at 5 am there is a dent in reporting, with no answer everywhere

The cause is that the agent kills cf-execd every day at 5 am (it's part of the promises), and in result it kill the agent that was launched by cf-execd, interrupting the run.

Probably killing cf-execd should not kill cf-agent, but i'm not sure about this one.

Actions #6

Updated by François ARMAND over 9 years ago

  • Subject changed from cf-agent execution is killed every day at 5am to cf-agent execution is killed if cf-execd is restarted (for example on a 5a.m daily restart)

I think they should be independant, but don't quite see the implication here.

Well, in all cases, I only see two possibilities:

- either cf-execd and cf-agent are independant, and killing the parent does not kill the child;
- or they are not (and we don't want them to be), and we need to have a gracefull restart for cf-execd. That may be extremelly tricky, because deciding between a stalled cf-agent and one that is just taking a lot of time to finish its tasks may not have a definitive answer...

I think that cf-execd manages a lot of thing (sending emails, getting stats of execution, etc) so perhaps we won't be able to split them, even if we wanted.

For now, I don't see other option. Perhaps we should just not kill the process every days, and consider a kill for what it is - an interruption.

Actions #7

Updated by Nicolas CHARLES over 9 years ago

The issue here is not in case of stalled agent, it's really that we are preventively restarting cf-execd every day, and it kills the agent.

Restarting the executor daemon used to be part of defaut CFEngine masterfiles, but now they are not anymore, (at least since 08/2013)
So maybe we should remove this preventive restart

Actions #8

Updated by François ARMAND over 9 years ago

Nicolas CHARLES wrote:

The issue here is not in case of stalled agent, it's really that we are preventively restarting cf-execd every day, and it kills the agent.

I understand that, what let think that I was talking about stalled agent for the restart ? With graceful restart, you have basically 3 cases to consider:

- "please stop" : [ok I'm stopped]
- "please stop" : [nothing happen : stall or just finishing ?] ;
- ok, after some times / some retries, it's stopped;
- even after some times / some retries, it is not stopped => stalled or not ? When to kill ?

Restarting the executor daemon used to be part of defaut CFEngine masterfiles, but now they are not anymore, (at least since 08/2013)
So maybe we should remove this preventive restart

That could be the workaround for that.

Actions #9

Updated by François ARMAND over 9 years ago

  • Assignee changed from François ARMAND to Nicolas CHARLES
  • Reproduced set to No
Actions #10

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.9 to 2.11.10
Actions #11

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.10 to 2.11.11
Actions #12

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.11 to 2.11.12
Actions #13

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.12 to 2.11.13
Actions #14

Updated by Vincent MEMBRÉ over 9 years ago

  • Target version changed from 2.11.13 to 2.11.14
Actions #15

Updated by Vincent MEMBRÉ about 9 years ago

  • Target version changed from 2.11.14 to 2.11.15
Actions #16

Updated by Nicolas CHARLES about 9 years ago

  • Related to Bug #7274: The daily cf-execd and cf-serverd restart should use SRC on AIX added
Actions #17

Updated by Vincent MEMBRÉ about 9 years ago

  • Target version changed from 2.11.15 to 2.11.16
Actions #18

Updated by Vincent MEMBRÉ about 9 years ago

  • Target version changed from 2.11.16 to 2.11.17
Actions #19

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 2.11.17 to 2.11.18
Actions #20

Updated by Vincent MEMBRÉ almost 9 years ago

  • Target version changed from 2.11.18 to 2.11.19
Actions #21

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.19 to 2.11.20
Actions #22

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.20 to 2.11.21
Actions #23

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.21 to 2.11.22
Actions #24

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.22 to 2.11.23
Actions #25

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.11.23 to 2.11.24
Actions #26

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 2.11.24 to 308
Actions #27

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 308 to 3.1.14
Actions #28

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 3.1.14 to 3.1.15
Actions #29

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 3.1.15 to 3.1.16
Actions #30

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 3.1.16 to 3.1.17
Actions #31

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 3.1.17 to 3.1.18
Actions #32

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 3.1.18 to 3.1.19
Actions #33

Updated by François ARMAND over 7 years ago

  • Severity set to Critical - prevents main use of Rudder | no workaround | data loss | security
  • User visibility set to Operational - other Techniques | Technique editor | Rudder settings
  • Priority set to 0

I'm setting it to critical because there is no workaround, and it breaks the main purpose of rudder. On the other hand, I'm setting it to operationnal because the problem is not supposed to happen on tests.

Actions #34

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 3.1.19 to 3.1.20
Actions #35

Updated by Jonathan CLARKE over 7 years ago

  • Status changed from Discussion to New
Actions #36

Updated by Jonathan CLARKE over 7 years ago

  • Assignee deleted (Nicolas CHARLES)
Actions #37

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 3.1.20 to 3.1.21
Actions #38

Updated by Benoît PECCATTE over 7 years ago

  • Priority changed from 0 to 50

I think we should just remove this kill.
We are now 5 year later and cf-execd should not have problem running for a long time anymore.

Actions #39

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 3.1.21 to 3.1.22
Actions #40

Updated by Benoît PECCATTE over 7 years ago

  • Priority changed from 50 to 63
Actions #41

Updated by Vincent MEMBRÉ over 7 years ago

  • Target version changed from 3.1.22 to 3.1.23
Actions #42

Updated by Vincent MEMBRÉ about 7 years ago

  • Target version changed from 3.1.23 to 3.1.24
Actions #43

Updated by Benoît PECCATTE about 7 years ago

  • Target version changed from 3.1.24 to 4.3.0~beta1

Let's just remove the restart in 4.3, this should not be needed anymore

Actions #44

Updated by Nicolas CHARLES about 7 years ago

  • Status changed from New to In progress
  • Assignee set to Nicolas CHARLES
Actions #45

Updated by Nicolas CHARLES about 7 years ago

This was implemented in #7274

Actions #46

Updated by Nicolas CHARLES about 7 years ago

  • Status changed from In progress to Rejected
Actions

Also available in: Atom PDF