User story #6910
open
Only use one copy of system techniques
Added by Florian Heigl about 9 years ago.
Updated over 8 years ago.
Category:
Performance and scalability
Description
I've been looking at the runtimes of cf-agent on one specially long-running replay.
The agent would spend 1000s of seconds comparing the generated promises for each node.
But I suddenly had an idea:
Most of those files are all identical since we don't have much of a different policy yet.
So why not only cf-serve only one instance of the system techniques instead of one per node?
On larger Rudder relays this removes 10s of 1000's of files from the cf-serverd config and also brings the comparism on the relay down to 10s in stead of 10's of 1000s of comparisms.
Hi Florian,
Unfortunately, system techniques are different for each nodes, so this would be a quite complex change. But there are indeed some parts we could outsource in common librairies (a bit like what we've done with ncf) to lower the burden
A bigger win would be to :
- use the promises generated file to check if the file changed or not (big win when nothing change)
- tar.gz the promises for the relay (huge win when something change)
What do you think of it ?
- I think "commonization" will improve performance on multiple ends, thus be worth it for the long-term benefits.
- i'm very much supporting either tar.gz or rsync; I got to the understanding that for relays only the normal cf transport needs to be torn out and replaced. We've discussed on this a little, the tricky bit seems to be 'flipping' the directories on the relay after the transfer. So agents in mid-download won't get an incosistent policy state with 3 files from the old one and 5 from the newer one.
- I don't think the promises_generated is any good at the moment, it'll need all the other improvements to make sure promise updates are more granular. (it only has value if promises are validated for few nodes, otherwise it'll not have effect). prod might see new policy generation ~20 times a day just from new servers being added.
Personally I think rsync could be interesting since it can do hard links but tar is simpler and might be easier to transfer.
I've checked what tarball size we look at. compressed (gzip) tarball is 62M, and took about 50 seconds to build. uncompressed size is around 500MB.
(This is still 'no rules except system techniques' kind of state)
Commonization is a win in every scenario, so this is indeed on the roadmap
tar.gz would ensure consistency - we'd only copy the tar.gz file, and it is atomic - however, we should have one tar.gz per node in my opinion (otherwise we will transfer too much data on the network i'm afraid)
promise_generated as a big advantage: it is really small, so sha1'ing it is really fast, compared to sha1'ing the tar.gz. It would be only useful when combined with tar.gz
Re, with some remainder of brains:
1)
CF-Serverd does so far not reuse it's tcp socket, this is sucking up notable time - and stupid.
I don't know what do write in an upstream bug, but maybe... you could give it a second look and consider opening one on the CFEngine end.
2)
I think the approach with a hash map of file checksums is better than doing single comparisms, this would move us to a list of files that need to be transferred instead of still checking each nodes' checksums.
So get checksums of all tarballs, send them, fetch needed tarballs - for example.
I'd nonetheless prefer something that does one "sync" than seeing it handle hundreds or thousands of tarballs.
3) Checksums are definitely a good thing though :)
Just confirming it:
It hHas to be one file per node (for the node specific parts g) because otherwise we'd lose the big advantage of avoiding information leaks. (host A only gets info for host A)
- Target version set to Ideas (not version specific)
Also available in: Atom
PDF