|
|
|
|
|
by doublerebel
3567 days ago
|
|
Thanks for all the info. Any insight into the service discovery issues described in the docs [1]? Many existing RPC systems treat service discovery as a
fully consistent process. To this end, they use fully
consistent leader election backing stores such as
Zookeeper, etcd, Consul, etc. Our experience has been
that operating these backing stores at scale is painful.
[1]: https://lyft.github.io/envoy/docs/intro/arch_overview/servic... |
|
We have had zero outages caused by our eventually consistent discovery system with active health checking (knock on wood), and haven't really touched the discovery service code in months. It just runs.
I'm not saying that a system using ZK, etc. can't be made to work. It certainly can since many companies do it. It's mostly that I think those solutions are actually making the overall problem a lot more complicated and prone to failure than it has to be.