Project

General

Profile

Actions

User story #9796

closed

Relay Load balancing with HA

Added by Janos Mattyasovszky about 8 years ago. Updated almost 3 years ago.

Status:
Backlog
Priority:
N/A
Assignee:
-
Category:
Performance and scalability
UX impact:
Suggestion strength:
User visibility:
Effort required:
Name check:
Fix check:
Regression:

Description

Hi,

Having a big environment is only manageable if you deploy multiple relays, so that the multiple thousands of nodes do not hammer all on the root server, and you can secure the root server even better if only a small subset of IP-s has to be able to access it.

You can utilize relays depending on security zones, geo-location, and number of nodes that need to connect to it.

However, there is almost never an even distribution in the numbers of them, so there are peaks of similar "types" of nodes.

For example, this is a real distribution (you see one part of relay uuid vs nodes connected to them):
  • 6a81cf67da3d 486
  • e110a8eac053 13
  • 16992e66f373 172
  • afb7081228de 3350
  • 4a9f6afea17c 133
  • 2555476521fb 2304
  • 89ec081228de 856
  • 76b2081228de 689
  • 089053ddd718 588

As you can see, there are big peaks of 2k+ number of nodes connected to one relays, because they are "logically identical" based on the requirements.
And those two outraging are also basically in the same "zone", we just separated them by something we could automate, but basically 5600+ nodes would require the same relay.

The idea:
  1. Create the concept of "zones" that you can use to describe an area like "TEST", "QA", "DMZ-1", "REGULAR", and also put the IP Subnet restrictions on this level. The default could be "default" :-)
  2. Assign the relays to these zones, so they inherit the description and IP subnet.
  3. The node registers to an existing policy server, that puts it in a zone /or/ If #7876 is implemented, you can define the "zone" on allocation or it gets "default", and the response would contain a relay that has the least nodes on it.
  4. Sync the policy to all relays in a zone.
  5. The node receives the list of all possible relays in the "zone" as a list, and chooses one of it randomly if it is available (port 443+5309 open?)
This would:
  • ensure you can do rolling upgrades and the nodes can handle the outage of relays as long there are others in the "zone"
  • distribute the load (hopefull) evenly among the relays in the same "zone"
  • only extend the current setup, where you have one relay for a node, you just extend '1' to 'n'
Actions #1

Updated by Benoît PECCATTE almost 8 years ago

  • Category set to Performance and scalability
Actions #2

Updated by Janos Mattyasovszky about 7 years ago

  • Description updated (diff)
Actions #3

Updated by François ARMAND almost 3 years ago

  • Status changed from New to Backlog

This is not tracked into our roadmap, we will change the status if it is priorized.

Actions

Also available in: Atom PDF