Bug #6915
closedCf-agent writes incorrect files when the server answers too slowly during recursive copy
Description
One file was copied in the location of another. The logs contain:
error: /default/download_from_shared_folder/files/'/bin/'[3]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable) error: /default/download_from_shared_folder/files/'/bin/'[3]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable) error: /default/download_from_shared_folder/files/'/bin/'[3]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/bin/adm.sh', got 'CFD_FALSE' error: /default/download_from_shared_folder/files/'/bin/'[3]: Was not able to copy '/var/rudder/configuration-repository/shared-files/bin/dir.sh' to '/bin/dir.sh' error: /default/download_from_shared_folder/files/'/bin/'[3]: Cannot read SYNCH reply from '1.2.3.4', only -1/13 items parsed error: /default/download_from_shared_folder/files/'/bin/'[3]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/bin/script.sh', got ' \C2}\93\88\BB!.\ED\BE\94Q\EAy\FD/AK~
This is a know issue: https://dev.cfengine.com/issues/6027.
Updated by Alexis Mousset over 9 years ago
- Subject changed from cf-agent when the connection with the server closes during recursive copy to cf-agent writes incorrect files when the connection with the server closes during recursive copy
Updated by Alexis Mousset over 9 years ago
The fix to this issue seems to be in: https://github.com/cfengine/core/commit/282b149f1680eb54221fb67d7f72366e2b574dce
I am trying to reproduce the problem to confirm.
Updated by Alexis Mousset over 9 years ago
I reproduced the bug by adding a sleep(60) before answering to MD5 requests on the server.
2015-07-01T14:57:17+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable) 2015-07-01T14:57:17+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable) 2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/47.txt', got 'CFD_FALSE' 2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/90.txt' failed 2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed 2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/87.txt', got 'Ɇh�U�=�' 2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/18.txt' failed 2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed 2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed 2015-07-01T14:58:17+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable) 2015-07-01T14:58:17+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable) 2015-07-01T14:58:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/20.txt', got 'CFD_TRUE' 2015-07-01T14:58:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/83.txt' failed 2015-07-01T14:58:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed 2015-07-01T14:58:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/14.txt', got '�2U�F�'
This seems to be a problem in request/response ordering after a slow answer. It was reproduced with a 3.6.5 agent too.
Updated by Alexis Mousset over 9 years ago
- Subject changed from cf-agent writes incorrect files when the connection with the server closes during recursive copy to cf-agent writes incorrect files when the server answers too slowly during recursive copy
Updated by Alexis Mousset over 9 years ago
- Status changed from New to In progress
Updated by Alexis Mousset over 9 years ago
- Target version set to 2.11.12
It seems there is no clean and easy fix for this issue. The best way would be to properly close the socket when there is an error, and remove it from the connection cache, as described in TODO comments in the code, but it requires too many changes. We could also stop the execution of the current copy promise using the error flag of the connection.
A simple fix is to close the socket descriptor when detecting a timeout, avoiding further communication, and thus preventing file corruption.
Updated by Alexis Mousset over 9 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Alexis Mousset to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder-packages/pull/706
Updated by Alexis Mousset over 9 years ago
- Status changed from Pending technical review to Pending release
- % Done changed from 0 to 100
Applied in changeset rudder-packages|c4994ed2e851e05886521a9457bd6f7ddce2742e.
Updated by Nicolas CHARLES over 9 years ago
Applied in changeset rudder-packages|73f7eb861e93f97db12b6af28e54bd171d247f3e.
Updated by Vincent MEMBRÉ over 9 years ago
- Subject changed from cf-agent writes incorrect files when the server answers too slowly during recursive copy to Cf-agent writes incorrect files when the server answers too slowly during recursive copy
Updated by Vincent MEMBRÉ over 9 years ago
- Status changed from Pending release to Released
Updated by Alexis Mousset about 9 years ago
The issue was fixed upstream by 3 pull requests: 2350, 2351, 2356. These are cleaner and more complete than our fix, and we could probably backport the improvements.
Updated by Alexis Mousset almost 9 years ago
- Related to Bug #7629: rudder-agent does not stop on network error during file copy, which can lead to file deletions when purging is enabled added