Project

General

Profile

Bug #5925

On SLES, when we upgrade rudder-server-root, while a node is copying its promise, the copy is corrupted, and files are lost

Added by Nicolas CHARLES about 4 years ago. Updated 30 days ago.

Status:
Rejected
Priority:
1
Category:
Web - Config management
Target version:
Pull Request:
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Very Small
Priority:
0

Description

When upgrading the rudder-server-root on a SLES, it seems it broke something on the client side, for some nodes got corrupted promises (namely missing files)

The output file in the outputs folder on the node side, at this time, is

2014-11-05T10:29:13+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
2014-11-05T10:29:13+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Protocol transaction broken off (1). (ReceiveTransaction: Resource temporarily unavailable)
2014-11-05T10:29:13+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Authentication dialogue with '192.168.249.141' failed
2014-11-05T10:29:13+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: No suitable server responded to hail
2014-11-05T10:31:12+0000    error: /default/update/methods/'update'[0]: Method 'update_action' failed in some repairs
R: @@Common@@log_info@@hasPolicyServer-root@@common-root@@379@@common@@StartRun@@2014-11-05 10:31:15+00:00##8b4a4e31-3241-42bb-be63-8d917d3ee9c7@#Start execution

and
2014-11-05T10:33:49+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
2014-11-05T10:35:06+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/cfengine-community/inputs'[0]: Couldn't receive. (recv: Connection reset by peer)
2014-11-05T10:35:06+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/cfengine-community/inputs'[0]: Failed receive. (ReceiveTransaction: Connection reset by peer)
2014-11-05T10:35:06+0000    error: /default/update/methods/'update'[0]: Method 'update_action' failed in some repairs
2014-11-05T10:35:06+0000    error: Can't stat file '/var/rudder/cfengine-community/inputs/common/1.0/cf-served.cf' for parsing. (stat: No such file or directory)
2014-11-05T10:35:06+0000    error: Policy failed validation with command '"/var/rudder/cfengine-community/bin/cf-promises" -c "/var/rudder/cfengine-community/inputs/promises.cf"'
2014-11-05T10:35:06+0000    error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe
2014-11-05T10:35:06+0000    error: Can't stat file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' for parsing. (stat: No such file or directory).

Happened on SLES, while upgrading from Rudder 2.11.1 to 2.11.4
May happen on others systems

Note: I did not run /etc/init.d/rudder-server-root stop before upgrading


Related issues

Related to Rudder - Bug #12265: rudder agent check should trigger failsafe run when promises are brokenReleased

History

#1 Updated by Vincent MEMBRÉ almost 4 years ago

  • Target version changed from 2.11.5 to 2.11.6

#2 Updated by François ARMAND almost 4 years ago

the only things I can see that can be done are:

- forcing to stop the server before upgrading to nicelly shut down connection
- testing client-side if the copied promises are OK => see #5641

Is there any other ideas on that ?

#3 Updated by Vincent MEMBRÉ almost 4 years ago

  • Target version changed from 2.11.6 to 2.11.7

#4 Updated by Vincent MEMBRÉ almost 4 years ago

  • Target version changed from 2.11.7 to 2.11.8

#5 Updated by Vincent MEMBRÉ almost 4 years ago

  • Target version changed from 2.11.8 to 2.11.9

#6 Updated by Benoît PECCATTE almost 4 years ago

  • Category changed from 14 to Web - Config management

#7 Updated by Vincent MEMBRÉ over 3 years ago

  • Target version changed from 2.11.9 to 2.11.10

#8 Updated by Vincent MEMBRÉ over 3 years ago

  • Target version changed from 2.11.10 to 2.11.11

#9 Updated by Vincent MEMBRÉ over 3 years ago

  • Target version changed from 2.11.11 to 2.11.12

#10 Updated by Vincent MEMBRÉ over 3 years ago

  • Target version changed from 2.11.12 to 2.11.13

#11 Updated by Vincent MEMBRÉ over 3 years ago

  • Target version changed from 2.11.13 to 2.11.14

#12 Updated by Vincent MEMBRÉ about 3 years ago

  • Target version changed from 2.11.14 to 2.11.15

#13 Updated by Vincent MEMBRÉ about 3 years ago

  • Target version changed from 2.11.15 to 2.11.16

#14 Updated by Vincent MEMBRÉ about 3 years ago

  • Target version changed from 2.11.16 to 2.11.17

#15 Updated by Vincent MEMBRÉ about 3 years ago

  • Target version changed from 2.11.17 to 2.11.18

#16 Updated by Vincent MEMBRÉ almost 3 years ago

  • Target version changed from 2.11.18 to 2.11.19

#17 Updated by Vincent MEMBRÉ almost 3 years ago

  • Target version changed from 2.11.19 to 2.11.20

#18 Updated by Vincent MEMBRÉ over 2 years ago

  • Target version changed from 2.11.20 to 2.11.21

#19 Updated by Vincent MEMBRÉ over 2 years ago

  • Target version changed from 2.11.21 to 2.11.22

#20 Updated by Vincent MEMBRÉ over 2 years ago

  • Target version changed from 2.11.22 to 2.11.23

#21 Updated by Vincent MEMBRÉ over 2 years ago

  • Target version changed from 2.11.23 to 2.11.24

#22 Updated by Vincent MEMBRÉ over 2 years ago

  • Target version changed from 2.11.24 to 308

#23 Updated by Vincent MEMBRÉ about 2 years ago

  • Target version changed from 308 to 3.1.14

#24 Updated by Vincent MEMBRÉ about 2 years ago

  • Target version changed from 3.1.14 to 3.1.15

#25 Updated by Vincent MEMBRÉ about 2 years ago

  • Target version changed from 3.1.15 to 3.1.16

#26 Updated by Vincent MEMBRÉ about 2 years ago

  • Target version changed from 3.1.16 to 3.1.17

#27 Updated by Vincent MEMBRÉ about 2 years ago

  • Target version changed from 3.1.17 to 3.1.18

#28 Updated by Vincent MEMBRÉ almost 2 years ago

  • Target version changed from 3.1.18 to 3.1.19

#29 Updated by François ARMAND over 1 year ago

  • Severity set to Major - prevents use of part of Rudder | no simple workaround
  • User visibility set to Operational - other Techniques | Technique editor | Rudder settings
  • Priority set to 30

#30 Updated by Vincent MEMBRÉ over 1 year ago

  • Target version changed from 3.1.19 to 3.1.20

#31 Updated by Vincent MEMBRÉ over 1 year ago

  • Target version changed from 3.1.20 to 3.1.21

#32 Updated by Vincent MEMBRÉ over 1 year ago

  • Target version changed from 3.1.21 to 3.1.22

#33 Updated by Benoît PECCATTE over 1 year ago

  • Priority changed from 30 to 43

#34 Updated by Vincent MEMBRÉ over 1 year ago

  • Target version changed from 3.1.22 to 3.1.23

#35 Updated by Vincent MEMBRÉ over 1 year ago

  • Target version changed from 3.1.23 to 3.1.24

#36 Updated by Vincent MEMBRÉ about 1 year ago

  • Target version changed from 3.1.24 to 3.1.25

#37 Updated by Vincent MEMBRÉ about 1 year ago

  • Target version changed from 3.1.25 to 387

#38 Updated by Vincent MEMBRÉ 12 months ago

  • Target version changed from 387 to 4.1.10

#39 Updated by Vincent MEMBRÉ 10 months ago

  • Target version changed from 4.1.10 to 4.1.11
  • Priority changed from 43 to 44

#40 Updated by Vincent MEMBRÉ 8 months ago

  • Target version changed from 4.1.11 to 4.1.12
  • Priority changed from 44 to 45

#41 Updated by Vincent MEMBRÉ 7 months ago

  • Target version changed from 4.1.12 to 4.1.13

#42 Updated by Vincent MEMBRÉ 5 months ago

  • Target version changed from 4.1.13 to 4.1.14
  • Priority changed from 45 to 46

#43 Updated by Benoît PECCATTE 4 months ago

  • Target version changed from 4.1.14 to 4.1.15

#44 Updated by Nicolas CHARLES 2 months ago

  • Effort required set to Very Small
  • Priority changed from 46 to 73

i'm quite sure it can recover given that rudder agent health monitor the state of policies
Setting to Very Small to try to reproduce and check it is indeed recovering

#45 Updated by Vincent MEMBRÉ about 2 months ago

  • Target version changed from 4.1.15 to 4.1.16

#46 Updated by Vincent MEMBRÉ about 1 month ago

  • Target version changed from 4.1.16 to 4.1.17
  • Priority changed from 73 to 74

#47 Updated by François ARMAND about 1 month ago

  • Assignee set to Nicolas CHARLES

#48 Updated by Nicolas CHARLES 30 days ago

  • Status changed from New to Rejected
  • Priority changed from 74 to 0

This has been fixed via #12265 : if policies are corrupted, then rudder agent check will fix them

#49 Updated by Nicolas CHARLES 30 days ago

  • Related to Bug #12265: rudder agent check should trigger failsafe run when promises are broken added

Also available in: Atom PDF