Project

General

Profile

Actions

Bug #6915

closed

Cf-agent writes incorrect files when the server answers too slowly during recursive copy

Added by Alexis Mousset over 8 years ago. Updated over 8 years ago.

Status:
Released
Priority:
N/A
Category:
Agent
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

One file was copied in the location of another. The logs contain:

error: /default/download_from_shared_folder/files/'/bin/'[3]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
error: /default/download_from_shared_folder/files/'/bin/'[3]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable)
error: /default/download_from_shared_folder/files/'/bin/'[3]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/bin/adm.sh', got 'CFD_FALSE'
error: /default/download_from_shared_folder/files/'/bin/'[3]: Was not able to copy '/var/rudder/configuration-repository/shared-files/bin/dir.sh' to '/bin/dir.sh'
error: /default/download_from_shared_folder/files/'/bin/'[3]: Cannot read SYNCH reply from '1.2.3.4', only -1/13 items parsed
error: /default/download_from_shared_folder/files/'/bin/'[3]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/bin/script.sh', got '    \C2}\93\88\BB!.\ED\BE\94Q\EAy\FD/AK~

This is a know issue: https://dev.cfengine.com/issues/6027.


Subtasks 1 (0 open1 closed)

Bug #6934: cf-agent writes incorrect files when the server answers too slowly during recursive copy on 3.5.3ReleasedMatthieu CERDA2015-07-07Actions

Related issues 1 (0 open1 closed)

Related to Rudder - Bug #7629: rudder-agent does not stop on network error during file copy, which can lead to file deletions when purging is enabledReleasedAlexis Mousset2015-12-17Actions
Actions #1

Updated by Alexis Mousset over 8 years ago

  • Subject changed from cf-agent when the connection with the server closes during recursive copy to cf-agent writes incorrect files when the connection with the server closes during recursive copy
Actions #2

Updated by Alexis Mousset over 8 years ago

  • Description updated (diff)
Actions #3

Updated by Alexis Mousset over 8 years ago

The fix to this issue seems to be in: https://github.com/cfengine/core/commit/282b149f1680eb54221fb67d7f72366e2b574dce

I am trying to reproduce the problem to confirm.

Actions #4

Updated by Alexis Mousset over 8 years ago

I reproduced the bug by adding a sleep(60) before answering to MD5 requests on the server.

2015-07-01T14:57:17+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
2015-07-01T14:57:17+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable)
2015-07-01T14:57:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/47.txt', got 'CFD_FALSE'
2015-07-01T14:57:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/90.txt' failed
2015-07-01T14:57:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed
2015-07-01T14:57:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/87.txt', got 'Ɇh�U�=�'
2015-07-01T14:57:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/18.txt' failed
2015-07-01T14:57:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed
2015-07-01T14:57:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed
2015-07-01T14:58:17+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Timeout - remote end did not respond with the expected amount of data (received=0, expecting=8). (recv: Resource temporarily unavailable)
2015-07-01T14:58:17+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Failed receive. (ReceiveTransaction: Resource temporarily unavailable)
2015-07-01T14:58:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/20.txt', got 'CFD_TRUE'
2015-07-01T14:58:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Copy from 'server:/var/rudder/configuration-repository/shared-files/recv/83.txt' failed
2015-07-01T14:58:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Cannot read SYNCH reply from '192.168.44.2', only -1/13 items parsed
2015-07-01T14:58:47+0000    error: /default/download_from_shared_folder/files/'/tmp/recv'[0]: Transmission refused or failed statting '/var/rudder/configuration-repository/shared-files/recv/14.txt', got '�2U�F�'

This seems to be a problem in request/response ordering after a slow answer. It was reproduced with a 3.6.5 agent too.

Actions #5

Updated by Alexis Mousset over 8 years ago

  • Subject changed from cf-agent writes incorrect files when the connection with the server closes during recursive copy to cf-agent writes incorrect files when the server answers too slowly during recursive copy
Actions #6

Updated by Alexis Mousset over 8 years ago

  • Status changed from New to In progress
Actions #7

Updated by Alexis Mousset over 8 years ago

  • Target version set to 2.11.12

It seems there is no clean and easy fix for this issue. The best way would be to properly close the socket when there is an error, and remove it from the connection cache, as described in TODO comments in the code, but it requires too many changes. We could also stop the execution of the current copy promise using the error flag of the connection.

A simple fix is to close the socket descriptor when detecting a timeout, avoiding further communication, and thus preventing file corruption.

Actions #8

Updated by Alexis Mousset over 8 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Alexis Mousset to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/706
Actions #9

Updated by Alexis Mousset over 8 years ago

  • Status changed from Pending technical review to Pending release
  • % Done changed from 0 to 100
Actions #11

Updated by Vincent MEMBRÉ over 8 years ago

  • Subject changed from cf-agent writes incorrect files when the server answers too slowly during recursive copy to Cf-agent writes incorrect files when the server answers too slowly during recursive copy
Actions #12

Updated by Vincent MEMBRÉ over 8 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 2.11.12, 3.0.7 and 3.1.0 which were released today.

Actions #13

Updated by Alexis Mousset over 8 years ago

The issue was fixed upstream by 3 pull requests: 2350, 2351, 2356. These are cleaner and more complete than our fix, and we could probably backport the improvements.

Actions #14

Updated by Alexis Mousset over 8 years ago

  • Related to Bug #7629: rudder-agent does not stop on network error during file copy, which can lead to file deletions when purging is enabled added
Actions

Also available in: Atom PDF