Bug #6915
closed
Cf-agent writes incorrect files when the server answers too slowly during recursive copy
Added by Alexis Mousset over 9 years ago.
Updated about 9 years ago.
Description
One file was copied in the location of another. The logs contain:
error: /default/download_from_shared_folder/files/'/bin/'[3]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
error: /default/download_from_shared_folder/files/'/bin/'[3]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable)
error: /default/download_from_shared_folder/files/'/bin/'[3]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/bin/adm.sh', got 'CFD_FALSE'
error: /default/download_from_shared_folder/files/'/bin/'[3]: Was not able to copy '/var/rudder/configuration-repository/shared-files/bin/dir.sh' to '/bin/dir.sh'
error: /default/download_from_shared_folder/files/'/bin/'[3]: Cannot read SYNCH reply from '1.2.3.4', only -1/13 items parsed
error: /default/download_from_shared_folder/files/'/bin/'[3]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/bin/script.sh', got ' \C2}\93\88\BB!.\ED\BE\94Q\EAy\FD/AK~
This is a know issue: https://dev.cfengine.com/issues/6027.
- Subject changed from cf-agent when the connection with the server closes during recursive copy to cf-agent writes incorrect files when the connection with the server closes during recursive copy
- Description updated (diff)
I reproduced the bug by adding a sleep(60) before answering to MD5 requests on the server.
2015-07-01T14:57:17+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
2015-07-01T14:57:17+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable)
2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/47.txt', got 'CFD_FALSE'
2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/90.txt' failed
2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed
2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/87.txt', got 'Ɇh�U�=�'
2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/18.txt' failed
2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed
2015-07-01T14:57:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed
2015-07-01T14:58:17+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
2015-07-01T14:58:17+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable)
2015-07-01T14:58:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/20.txt', got 'CFD_TRUE'
2015-07-01T14:58:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/83.txt' failed
2015-07-01T14:58:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed
2015-07-01T14:58:47+0000 error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/14.txt', got '�2U�F�'
This seems to be a problem in request/response ordering after a slow answer. It was reproduced with a 3.6.5 agent too.
- Subject changed from cf-agent writes incorrect files when the connection with the server closes during recursive copy to cf-agent writes incorrect files when the server answers too slowly during recursive copy
- Status changed from New to In progress
- Target version set to 2.11.12
It seems there is no clean and easy fix for this issue. The best way would be to properly close the socket when there is an error, and remove it from the connection cache, as described in TODO comments in the code, but it requires too many changes. We could also stop the execution of the current copy promise using the error flag of the connection.
A simple fix is to close the socket descriptor when detecting a timeout, avoiding further communication, and thus preventing file corruption.
- Status changed from In progress to Pending technical review
- Assignee changed from Alexis Mousset to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder-packages/pull/706
- Status changed from Pending technical review to Pending release
- % Done changed from 0 to 100
- Subject changed from cf-agent writes incorrect files when the server answers too slowly during recursive copy to Cf-agent writes incorrect files when the server answers too slowly during recursive copy
- Status changed from Pending release to Released
This bug has been fixed in Rudder 2.11.12, 3.0.7 and 3.1.0 which were released today.
The issue was fixed upstream by 3 pull requests: 2350, 2351, 2356. These are cleaner and more complete than our fix, and we could probably backport the improvements.
- Related to Bug #7629: rudder-agent does not stop on network error during file copy, which can lead to file deletions when purging is enabled added
Also available in: Atom
PDF