Actions
Bug #5925
closedOn SLES, when we upgrade rudder-server-root, while a node is copying its promise, the copy is corrupted, and files are lost
Status:
Rejected
Priority:
1 (highest)
Assignee:
Category:
Web - Config management
Target version:
Pull Request:
Severity:
Major - prevents use of part of Rudder | no simple workaround
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Very Small
Priority:
0
Name check:
Fix check:
Regression:
Description
When upgrading the rudder-server-root on a SLES, it seems it broke something on the client side, for some nodes got corrupted promises (namely missing files)
The output file in the outputs folder on the node side, at this time, is
2014-11-05T10:29:13+0000 error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable) 2014-11-05T10:29:13+0000 error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Protocol transaction broken off (1). (ReceiveTransaction: Resource temporarily unavailable) 2014-11-05T10:29:13+0000 error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Authentication dialogue with '192.168.249.141' failed 2014-11-05T10:29:13+0000 error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: No suitable server responded to hail 2014-11-05T10:31:12+0000 error: /default/update/methods/'update'[0]: Method 'update_action' failed in some repairs R: @@Common@@log_info@@hasPolicyServer-root@@common-root@@379@@common@@StartRun@@2014-11-05 10:31:15+00:00##8b4a4e31-3241-42bb-be63-8d917d3ee9c7@#Start execution
and
2014-11-05T10:33:49+0000 error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/ncf/common'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable) 2014-11-05T10:35:06+0000 error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/cfengine-community/inputs'[0]: Couldn't receive. (recv: Connection reset by peer) 2014-11-05T10:35:06+0000 error: /default/update/methods/'update'/default/update_action/files/'/var/rudder/cfengine-community/inputs'[0]: Failed receive. (ReceiveTransaction: Connection reset by peer) 2014-11-05T10:35:06+0000 error: /default/update/methods/'update'[0]: Method 'update_action' failed in some repairs 2014-11-05T10:35:06+0000 error: Can't stat file '/var/rudder/cfengine-community/inputs/common/1.0/cf-served.cf' for parsing. (stat: No such file or directory) 2014-11-05T10:35:06+0000 error: Policy failed validation with command '"/var/rudder/cfengine-community/bin/cf-promises" -c "/var/rudder/cfengine-community/inputs/promises.cf"' 2014-11-05T10:35:06+0000 error: CFEngine was not able to get confirmation of promises from cf-promises, so going to failsafe 2014-11-05T10:35:06+0000 error: Can't stat file '/var/rudder/cfengine-community/inputs/common/1.0/update.cf' for parsing. (stat: No such file or directory).
Happened on SLES, while upgrading from Rudder 2.11.1 to 2.11.4
May happen on others systems
Note: I did not run /etc/init.d/rudder-server-root stop before upgrading
Actions