|
|
|
|
|
by tptacek
229 days ago
|
|
Nope. Talk a little about how how Envoy's service discovery would scale to millions of apps in a global network? There's no way we found the only possible point in the solution space. Do they do something clever here? What we (think we) know won't work is a topologically centralized database that uses distributed consensus algorithms to synchronize. Running consensus transcontinentally is very painful, and keep the servers central, so that update proposals are local and the protocol can run quickly, subjects large portions of the network to partition risk. The natural response (what I think a lot of people do, in fact) is just to run multiple consensus clusters, but our UX includes a global namespace for customer workloads. |
|
> Running consensus transcontinentally is very painful
You don’t necessarily have to do that, you can keep your quorum nodes (lets assume we are talking about etcd) far enough apart to be in separate failure domains (fires, power loss, natural disasters) but close enough that network latency isn’t unbearably high between the replicas.
I have seen the following scheme work for millions of workloads:
1. Etcd quorum across 3 close, but independent regions
2. On startup, the app registers itself under a prefix that all other app replicas register
3. All clients to that app issue etcd watches for that prefix and almost instantly will be notified when there is a change. This is baked as a plugin within grpc clients.
4. A custom grpc resolver is used to do lookups by service name