Project

General

Profile

Actions

Bug #5925

closed

On SLES, when we upgrade rudder-server-root, while a node is copying its promise, the copy is corrupted, and files are lost

Added by Nicolas CHARLES almost 10 years ago. Updated about 6 years ago.

Status:
Rejected
Priority:
1 (highest)
Category:
Web - Config management
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Very Small
Priority:
0
Name check:
Fix check:
Regression:

Description

When upgrading the rudder-server-root on a SLES, it seems it broke something on the client side, for some nodes got corrupted promises (namely missing files)

The output file in the outputs folder on the node side, at this time, is

2014-11-05T10:29:13+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
2014-11-05T10:29:13+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Protocol transaction broken off (1). (ReceiveTransaction: Resource temporarily unavailable)
2014-11-05T10:29:13+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Authentication dialogue with '192.168.249.141' failed
2014-11-05T10:29:13+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: No suitable server responded to hail
2014-11-05T10:31:12+0000    error: /default/update/methods/'update'[0]: Method 'update_action' failed in some repairs
R: @@Common@@log_info@@hasPolicyServer-root@@common-root@@379@@common@@StartRun@@2014-11-05 10:31:15+00:00##8b4a4e31-3241-42bb-be63-8d917d3ee9c7@#Start execution

and
2014-11-05T10:33:49+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
2014-11-05T10:35:06+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/cfengine-community/inputs'[0]: Couldn't receive. (recv: Connection reset by peer)
2014-11-05T10:35:06+0000    error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/cfengine-community/inputs'[0]: Failed receive. (ReceiveTransaction: Connection reset by peer)
2014-11-05T10:35:06+0000    error: /default/update/methods/'update'[0]: Method 'update_action' failed in some repairs
2014-11-05T10:35:06+0000    error: Can't stat file '/var/rudder/cfengine-community/inputs/common/1.0/cf-served.cf' for parsing. (stat: No such file or directory)
2014-11-05T10:35:06+0000    error: Policy failed validation with command '"/var/rudder/cfengine-community/bin/cf-promises" -c "/var/rudder/cfengine-community/inputs/promises.cf"'
2014-11-05T10:35:06+0000    error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe
2014-11-05T10:35:06+0000    error: Can't stat file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' for parsing. (stat: No such file or directory).

Happened on SLES, while upgrading from Rudder 2.11.1 to 2.11.4
May happen on others systems

Note: I did not run /etc/init.d/rudder-server-root stop before upgrading


Related issues 1 (0 open1 closed)

Related to Rudder - Bug #12265: rudder agent check should trigger failsafe run when promises are brokenReleasedAlexis MoussetActions
Actions

Also available in: Atom PDF