Bug #3052
closedHaving an error with a Directive based on Download from a shared folder using Generic Variable Definition, will lead to all the Directives using Generic Variable to be in error
Description
- A Directive based on Download From A Shared Folder with a source using a Generic Variable
- Another Directive based on Download From A Shared Folder with a source using a Generic Variable but with a typo like "$(generic_variable_definiton.myvar)"
The two of them will be in error and execution of agent on the client will be about connection failed.
Updated by François ARMAND about 12 years ago
- Subject changed from Having an error with a Directive based on Download from a shared folder using Generic Variable Definition, will lead to all the Directives using Generic Variable to Having an error with a Directive based on Download from a shared folder using Generic Variable Definition, will lead to all the Directives using Generic Variable to be in error
Updated by François ARMAND about 12 years ago
- Category set to Web - Compliance & node report
- Status changed from New to 2
- Assignee set to Nicolas CHARLES
- Target version set to 2.4.0~rc2
Updated by Nicolas CHARLES about 12 years ago
- Status changed from 2 to In progress
Updated by Nicolas CHARLES about 12 years ago
The problem is technique/cfengine related, not reporting :
I reproduced it on a test environement, what is happening is that the server is denying further connections as the one with $() within is invalid :
rudder> Allowing 192.168.110.21 to connect without (re)checking ID rudder> Non-verified Host ID is 192.168.110.21 (Using skipverify) rudder> Non-verified User ID seems to be root (Using skipverify) rudder> -> Public key identity of host "192.168.110.21" is "MD5=f0318b7cb678e7f03a586ca784110555" rudder> -> Last saw -MD5=f0318b7cb678e7f03a586ca784110555 (alias 192.168.110.21) at Thu Nov 29 12:37:06 2012 rudder> A public key was already known from 192.168.110.21/192.168.110.21 - no trust required rudder> Adding IP 192.168.110.21 to SkipVerify - no need to check this if we have a key rudder> The public key identity was confirmed as root@192.168.110.21 rudder> -> Strong authentication of client 192.168.110.21/192.168.110.21 achieved rudder> -> Receiving session key from client (size=256)... rudder> Filename /var/rudder/configuration-repository/shared-files/$(generic_variable_definiton.def2) is resolved to /var/rudder/configuration-repository/shared-files/$(generic_variable_definiton.def2) rudder> Couldn't stat filename /var/rudder/configuration-repository/shared-files/$(generic_variable_definiton.def2) requested by host 192.168.110.21 rudder> !!! System error for lstat: "No such file or directory" rudder> Access control in sync rudder> From (host=192.168.110.21,user=root,ip=192.168.110.21) rudder> REFUSAL of request from connecting host: (SYNCH 1354189026 STAT /var/rudder/configuration-repository/shared-files/$(generic_variable_definiton.def2)) rudder> -> Accepting a connection rudder> Denying repeated connection from "192.168.110.21"
On the client side :
rudder> Comment: Enforce content of /tmp/two based on the content on the Rudder server with mtime method rudder> ......................................................... rudder> rudder> -> Copy file /tmp/two from /var/rudder/configuration-repository/shared-files/$(generic_variable_definiton.def2) check rudder> No existing connection to 192.168.110.20 is established... rudder> Set cfengine port number to 5309 = 5309 rudder> Set connection timeout to 10 rudder> -> Connect to 192.168.110.20 = 192.168.110.20 on port 5309 rudder> -> Matched IP 192.168.110.20 to key MD5=e82b35316903e3400a840a83fae1d295 rudder> .....................[.h.a.i.l.]................................. rudder> Strong authentication of server=192.168.110.20 connection confirmed rudder> -> Public key identity of host "192.168.110.20" is "MD5=e82b35316903e3400a840a83fae1d295" rudder> -> Last saw +MD5=e82b35316903e3400a840a83fae1d295 (alias 192.168.110.20) at Thu Nov 29 12:37:06 2012 rudder> Server returned error: Unspecified server refusal (see verbose server output) rudder> Can't stat /var/rudder/configuration-repository/shared-files/$(generic_variable_definiton.def2) in files.copyfrom promise rudder> ?> defining promise result class copy_file_1_failed (snip) rudder> -> Handling file existence constraints on /tmp/one rudder> -> File permissions on /tmp/one as promised rudder> ?> defining promise result class copy_file_2_kept rudder> -> Handling file existence constraints on /tmp/one rudder> -> File permissions on /tmp/one as promised rudder> ?> defining promise result class copy_file_2_kept rudder> -> Copy file /tmp/one from /var/rudder/configuration-repository/shared-files/def1 check rudder> Existing connection to 192.168.110.20 seems to be active... rudder> Set cfengine port number to 5309 = 5309 rudder> Set connection timeout to 10 rudder> -> Connect to 192.168.110.20 = 192.168.110.20 on port 5309 rudder> -> Matched IP 192.168.110.20 to key MD5=e82b35316903e3400a840a83fae1d295 rudder> Couldn't send rudder> !!! System error for send: "Broken pipe" rudder> Couldn't send rudder> !!! System error for send: "Broken pipe" rudder> Couldn't send rudder> !!! System error for send: "Broken pipe" rudder> Challenge response from server 192.168.110.20/192.168.110.20 was incorrect! rudder> I: Report relates to a promise with handle "" rudder> I: Made in version 'not specified' of '/var/rudder/cfengine-community/inputs/copyGitFile/1.3/copyFileFromSharedFolder.cf' near line 90 rudder> I: Comment: Enforce content of file /tmp/one based on the content on the Rudder server with mtime method rudder> !! Authentication dialogue with 192.168.110.20 failed rudder> Unable to establish connection with 192.168.110.20 rudder> ?> defining promise result class copy_file_2_failed
Updated by Nicolas CHARLES about 12 years ago
- Status changed from In progress to Discussion
Having only one connection available per node is clearly limiting for the download from a shared folder technique
Adding a "allallconnects" attribute in the server promises ( http://cfengine.com/manuals/cf3-Reference#allowallconnects-in-server ) solved the issue
It would allow each node to have several connection with the server. The obvious benefit is that if there is a long copy, other agent execution cannot connect to the server to fetch new promises. Apparently, if a copy fails, the connection is released late also
The risk is that if there are a lot of agent running on a specific node, they can hammer the policy server (but I'm not sure it woul really hammer, as they would still start every 5 minutes)
Should we implement this fix in 2.3 and/or 2.4 ??
Updated by Nicolas CHARLES about 12 years ago
- Assignee changed from Nicolas CHARLES to Jonathan CLARKE
Jon, can we implement this change in 2.3 and 2.4 ? It's a one line modification in the PT/Technique DistributePolicy
Updated by Jonathan CLARKE about 12 years ago
- Target version changed from 2.4.0~rc2 to 2.3.10
Yes, this seems like a good fix to me. I note that we have already set the max number of connections quite high, so this shouldn't be a problem (1000).
Of course, it must be fixed in 2.3 and 2.4, since this bug affects both versions.
Updated by Jonathan CLARKE about 12 years ago
- Assignee changed from Jonathan CLARKE to Nicolas CHARLES
Updated by Nicolas CHARLES about 12 years ago
- Status changed from Discussion to Pending technical review
The pull request is here
https://github.com/Normation/rudder-techniques/pull/6
Updated by Jonathan CLARKE about 12 years ago
- Status changed from Pending technical review to Released
Nicolas CHARLES wrote:
The pull request is here
https://github.com/Normation/rudder-techniques/pull/6
Looks good to me, merged.
Updated by Jonathan CLARKE about 12 years ago
- Status changed from Released to Pending release
Updated by Nicolas PERRON almost 12 years ago
- Status changed from Pending release to Released