Bug #8351
closedAfter the promises generation, cf-serverd config may not be reloaded, preventing new nodes from connecting
Description
When Rudder regenerate promises, it executes rudder-reload-cf-serverd as a post-hook to force reload of cf-serverd
In some cases, it may not do anything.
Coredumb reported with Rudder 3.2, on a Centos 7, that with time it fails (it works for 3 weeks-1 months, then fail to update cf-serverd conf)
Running rudder server debug for a new node allows it to access its data, or restarting cf-serverd lets it fetch its data.
Happen on 3.2, but most likely on previous version
Updated by Nicolas CHARLES over 8 years ago
- Assignee set to Alexis Mousset
Alexis,
You're the most familiar with this code - could you look at it ?
Updated by Alexis Mousset over 8 years ago
- Status changed from New to Discussion
- May take up to 1 minute on lightly loaded servers
- Does not occur until all threads are idle, which may never happen on loaded servers
On the other hand, a stop/start would break current connections, resulting in failed updates and file copies (and even worse with old agents that do not handle network errors correctly).
Updated by Vincent MEMBRÉ over 8 years ago
- Target version changed from 2.11.21 to 2.11.22
Updated by Jonathan CLARKE over 8 years ago
An ideal solution here would be to have a proxy in front of one or two cf-serverd processes that can phase out an existing cf-serverd when we know it's running an old config, and direct all new connections to the new cf-serverd.
haproxy?
Updated by Jonathan CLARKE over 8 years ago
- Assignee changed from Alexis Mousset to Jonathan CLARKE
Updated by Janos Mattyasovszky over 8 years ago
haproxy is a little bit overkill, don't you think?
You could use iptable rules and have them simply redirect to a newer version after doing the update.
That would also keep existing connections alive, since only new incoming connections would be affected.
Just make sure you'll have cf-serverd listen on localhost (bindtointerface), not to have additional ports open.
And by making this change, you'd also have to put the port information (port => "&COMMUNITYPORT&";) outside of cf-served.st, and the external script would need to manage the ports of the running cf-serverd instances, since you could have already n running instances of it, all having still open connections that they are serving. how do you know a cf-serverd is not having any clients any more, and can be killed? How do you handle the checking of the numbers of processes that are allowed to run? So many question when you go from 1 to n... :)
Updated by Vincent MEMBRÉ over 8 years ago
- Target version changed from 2.11.22 to 2.11.23
Updated by Vincent MEMBRÉ over 8 years ago
- Target version changed from 2.11.23 to 2.11.24
Updated by Vincent MEMBRÉ over 8 years ago
- Target version changed from 2.11.24 to 308
Updated by Vincent MEMBRÉ over 8 years ago
- Target version changed from 308 to 3.1.14
Updated by Vincent MEMBRÉ about 8 years ago
- Target version changed from 3.1.14 to 3.1.15
Updated by Vincent MEMBRÉ about 8 years ago
- Target version changed from 3.1.15 to 3.1.16
Updated by Vincent MEMBRÉ about 8 years ago
- Target version changed from 3.1.16 to 3.1.17
Updated by Vincent MEMBRÉ about 8 years ago
- Target version changed from 3.1.17 to 3.1.18
Updated by Vincent MEMBRÉ almost 8 years ago
- Target version changed from 3.1.18 to 3.1.19
Updated by Benoît PECCATTE over 7 years ago
- Severity set to Major - prevents use of part of Rudder | no simple workaround
- User visibility set to Operational - other Techniques | Technique editor | Rudder settings
- Priority set to 52
Updated by Vincent MEMBRÉ over 7 years ago
- Target version changed from 3.1.19 to 3.1.20
Updated by Jonathan CLARKE over 7 years ago
- Status changed from Discussion to New
- Assignee deleted (
Jonathan CLARKE)
Updated by Benoît PECCATTE over 7 years ago
- Target version changed from 3.1.20 to 4.2.0~beta1
- Effort required set to Medium
- Priority changed from 52 to 51
The solution to use iptables seems to be the best one :
- Run 2 instances of cf-serverd on 2 new ports 5311 and 5312
- Have an iptables rules to redirect on one : iptables I INPUT -p tcp --dport 5309 -j REDIRECT --to-port 5311 call reload on both instances and switch the destination port on promise generation
We must also :
- check that iptables is present and netfilter loaded at install time
- check that check-rudder-agent et al support more than 2 cf-serverd (don't forget cfengine enterprise)
So setting this to medium.
And since it is touchy, targeting this to next version.
Updated by Alexis Mousset over 7 years ago
- Target version changed from 4.2.0~beta1 to 4.2.0~beta2
- Priority changed from 51 to 50
Updated by Vincent MEMBRÉ over 7 years ago
- Target version changed from 4.2.0~beta2 to 4.2.0~beta3
Updated by Vincent MEMBRÉ over 7 years ago
- Target version changed from 4.2.0~beta3 to 4.2.0~rc1
Updated by Vincent MEMBRÉ over 7 years ago
- Target version changed from 4.2.0~rc1 to 4.2.0~rc2
Updated by Vincent MEMBRÉ over 7 years ago
- Target version changed from 4.2.0~rc2 to 4.2.0
Updated by Vincent MEMBRÉ about 7 years ago
- Target version changed from 4.2.0 to 4.2.1
Updated by Vincent MEMBRÉ about 7 years ago
- Target version changed from 4.2.1 to 4.2.2
Updated by Vincent MEMBRÉ about 7 years ago
- Target version changed from 4.2.2 to 4.2.3
- Priority changed from 50 to 56
Updated by Vincent MEMBRÉ about 7 years ago
- Target version changed from 4.2.3 to 4.2.4
Updated by Vincent MEMBRÉ almost 7 years ago
- Target version changed from 4.2.4 to 4.2.5
Updated by Vincent MEMBRÉ over 6 years ago
- Target version changed from 4.2.5 to 4.2.6
Updated by Vincent MEMBRÉ over 6 years ago
- Target version changed from 4.2.6 to 4.2.7
Updated by Vincent MEMBRÉ over 6 years ago
- Target version changed from 4.2.7 to 414
Updated by Vincent MEMBRÉ over 6 years ago
- Target version changed from 414 to 4.3.4
Updated by Benoît PECCATTE over 6 years ago
- Target version changed from 4.3.4 to 4.3.5
Updated by Vincent MEMBRÉ about 6 years ago
- Target version changed from 4.3.5 to 4.3.6
Updated by Vincent MEMBRÉ about 6 years ago
- Target version changed from 4.3.6 to 4.3.7
Updated by Vincent MEMBRÉ about 6 years ago
- Target version changed from 4.3.7 to 4.3.8
- Priority changed from 56 to 0
Updated by Vincent MEMBRÉ almost 6 years ago
- Target version changed from 4.3.8 to 4.3.9
Updated by Alexis Mousset almost 6 years ago
- Target version changed from 4.3.9 to 4.3.10
Updated by François ARMAND almost 6 years ago
- Target version changed from 4.3.10 to 4.3.11
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 4.3.11 to 4.3.12
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 4.3.12 to 4.3.13
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 4.3.13 to 4.3.14
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 4.3.14 to 587
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 587 to 4.3.14
Updated by Alexis Mousset over 5 years ago
- Target version changed from 4.3.14 to 5.0.13
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 5.0.13 to 5.0.14
Updated by Vincent MEMBRÉ about 5 years ago
- Target version changed from 5.0.14 to 5.0.15
Updated by Vincent MEMBRÉ about 5 years ago
- Target version changed from 5.0.15 to 5.0.16
Updated by Alexis Mousset almost 5 years ago
- Target version changed from 5.0.16 to 5.0.17
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 5.0.17 to 5.0.18
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 5.0.18 to 5.0.19
Updated by Vincent MEMBRÉ over 4 years ago
- Target version changed from 5.0.19 to 5.0.20
Updated by Alexis Mousset about 4 years ago
- Target version changed from 5.0.20 to 6.2.0~beta1
Updated by Alexis Mousset about 4 years ago
Options are:
- restart instead of reload + two-steps policy update in the agent
- implementing graceful restart properly in cf-serverd
- switch between two instances using iptables
- prevent new connection using iptables until reload is done (with a timeout)
Updated by Janos Matya about 4 years ago
Is it an architectural thing to keep using cf-serverd? Does it benefit in long term?
Any benefits on long term to switch over to some kind of https-based file serving?
Updated by François ARMAND about 4 years ago
We were discussing that idea not two weeks ago. It has a side bonus point to allow consistant handling of cfengine/DSC agent.
And it allows to clearly defined our own protocol and its elements (for ex: something different for remote run, with clearly defined limited commands and identical for cfengine/dsc, again).
Updated by Vincent MEMBRÉ about 4 years ago
- Target version changed from 6.2.0~beta1 to 6.2.0~rc1
Updated by François ARMAND about 4 years ago
- Target version deleted (
6.2.0~rc1)
Updated by Benoît PECCATTE about 4 years ago
There is anther solution to do that : use systemd to do graceful reload.
We can use systemd's socket activation to pass sockets to cf-serverd, this is easy to implement : https://github.com/puma/puma/blob/master/docs/systemd.md and https://insanity.industries/post/socket-activation-all-the-things/
From there, it should be possible to let systemd handle port opening and incoming connections and let it pass those connection to the right cf-server instance.
There is a blog post by Lennart that says systemd can do this kind graceful restart, but systemd waits for the process to fully stop before restarting a new one, which fails to provide the feature.
However, there are workarounds to make it work : this post in french https://vincent.bernat.ch/fr/blog/2018-systemd-golang-socket-activation explains those workarounds, they all go around making systemd ignoring that the process has not finished stopping.
Updated by Alexis Mousset almost 4 years ago
- Status changed from New to In progress
- Assignee set to Alexis Mousset
Updated by Alexis Mousset almost 4 years ago
To sum things up, what happens is:
- We add a new node to Rudder, we generate policies for it
cf-serverd
ACLs are updated to allow it to connectcf-serverd
policies are updated (on root or relay), with these new ACLs- Then the
cf-serverd
process is supposed to detect the configuration change within a minute, and reload it. The problem is thatcf-serverd
cannot reload config with open connections (due to technical limitations that would probably be hard to overcome), so it does not even try to check if reload is needed when there is at least one connected node. On moderately loaded relays it may work eventually, but as new connections are not prevented, it is easy to completely skip configuration reload indefinitely.- A service reload (with SIGHUP) has exactly the same limit, and will be ignored if there are open connections.
- A service restart fixes the problem, but will break existing connections, potentially leading to policy update and file copy errors on the connected nodes.
So we actually need two things:
- A way to properly reload config.
- It could use systemd socket activation, as we have systemd on all relays and root servers. In this case we would spawn a new
cf-serverd
with the new config when a configuration reload is required, and let the old process handle existing connections. We would need to ask it to terminate when all connections are closed (a feature that does not exist for now) to avoid piling up.
- It could use systemd socket activation, as we have systemd on all relays and root servers. In this case we would spawn a new
- A way to detect a reload is needed, from outside of cf-serverd.
- It is already done on root server using a policy generation hook.
- On relays we might need to rely on policy update repairs, to trigger a reload by systemd
Updated by Alexis Mousset almost 4 years ago
Benoît PECCATTE is working on implementing the systemd socket activation.
Upstream PR https://github.com/cfengine/core/pull/4499.
Updated by Vincent MEMBRÉ almost 4 years ago
- Status changed from In progress to Pending release
Updated by Vincent MEMBRÉ almost 4 years ago
- Status changed from Pending release to New
Updated by Benoît PECCATTE almost 4 years ago
- Status changed from New to In progress
- Assignee changed from Alexis Mousset to Benoît PECCATTE
Updated by Benoît PECCATTE almost 4 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Benoît PECCATTE to Alexis Mousset
- Pull Request set to https://github.com/Normation/rudder-packages/pull/2424
Updated by Benoît PECCATTE almost 4 years ago
- Status changed from Pending technical review to Pending release
Applied in changeset rudder-packages|2204c8b13931999c59be3822d705f0c7c3f6efe9.
Updated by Vincent MEMBRÉ almost 4 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 6.1.10 and 6.2.3 which were released today.