Hacker News new | ask | show | jobs
by winsletts 3994 days ago
The problem with etcd members on every Postgres node is that clusters fixed nodes or members. etcd doesn't function well in an environment where you could tear down / build up new nodes. Most of our Postgres service runs on AWS, and thus we must expect that any single node may vanish, and our system must replace that node. We tried running etcd alongside Postgres in an early prototype, but ran into issues with etcd cluster stability when destroying and recreating nodes. Thus, we opt for a stand alone etcd cluster distinct from the Postgres cluster.
1 comments

You can set up a local etcd proxy to mitigate this. You'd run the proxy listening on localhost, and then have it connected to the stable etcd cluster elsewhere.

The proxy can find the cluster manually or use SRV records. Autoscale the Postgres machines as much as you want after that while leaving etcd on stable machines.

That's what we basically trying to do in the future, however that's really hard to do if you want to have a running etcd cluster with 5 nodes all the time. You would need to check if one etcd died, and then either promote a proxy to a etcd master or run a new machine (the later is only possible in clouds or virtual environments)
You can do that trivially with Mesos and have it always ensure 5 instances are running. Bonus points that it will run identically on bare metal and cross cloud which means less vendor lock in for you.