|
(I help maintain SmartStack) I think it's really interesting that "what we've already got setup" is such a big driver in which systems we pick. For example, in 2013 Yelp already had hardened Zookeeper setups and Consul didn't exist ... and when it did exist Consul was the new "oh gosh they implemented their own consensus protocol" kid on the block, so we opted for what we felt was the safer option. I do have to be honest that I was also pretty worried about the ruby ZK library, but to be totally honest it's been relatively well behaved, aside from the whole sched_yield bug [1] occasionally causing Nerves to infinite loop shutting down. We fixed that with a heartbeat and a watchdog, so not too bad. Which technologies are available at which times really drives large technical choices like this. Consul template is undeniably useful, especially when you start integrating it with other Hashicorp products like Vault for real time rolling your SSL creds on all your distributed HAProxies. And I think that the whole Hashicorp ecosystem together is a really powerful set of free off the shelf tools that are really easy to get going with. I do think, however, that Synapse does have some important benefits, specifically around managing dynamic HAProxy configs that have to run on every host in your infra. For example, Synapse can remove dead servers ASAP through the HAProxy stats socket after getting realtime ZK push notifications rather than relying on healthchecks (in production <~10s across the fleet, which is crucial because if HAProxy healthchecked every 2s we'd kill our backend services with healthcheck storms ... because we've totally done that ...), Synapse can try to remember old servers so that temporary flakes don't result in HAProxy reloads, and it can try to spread and jitter HAProxy restarts so that the healthcheck storms have less impact, all while having flexibility in the registration backend (Synapse supports any service registry that can implement the interface [2]). However, there are some pretty cool alternative proxies to HAProxy out there and one area that Consul is really doing well on is supporting arbitrary manifestations of service registration data using Consul template; SmartStack is still playing catch up there, supporting only HAProxy and json files (with arbitrary outputs on their way in [3]). I enjoyed the article, and thank you to the Stripe engineers for taking the time to share your production experiences! I'm excited to see folks talking about these kinds of real world production issues that you have to deal with to build reliable service discovery. [1] https://github.com/zk-ruby/zk/issues/50
[2] https://github.com/airbnb/synapse/blob/master/lib/synapse/se...
[3] https://github.com/airbnb/synapse/pull/203 |
I disagree. That's a band-aid solution, good for a short time while you figure out the root cause and solve it for real.