Hacker News new | ask | show | jobs
by twoy 4096 days ago
If you don't want to manage ZooKeeper manually to bootstrap Mesos cluster, you need another Mesos cluster haha.
2 comments

Or just use Exhibitor. I don't remember the last time I manually anything'd with Zookeeper.

Between Exhibitor and Curator, I honestly find Zookeeper so straightforward and easy to work with that I don't quite understand the popularity of etcd.

I'm running a small Exhibitor/Zookeeper cluster dockerized (5 nodes, several hundred clients), and its extremely straightforward. Etcd isn't what we'd consider production-grade yet.
Having worked with etcd in production for the last few months, I have to agree. The CoreOS stack needs some more time to marinate.
Thanks for this comment. Glad to know I made the right choice.

I'm not saying etcd won't ever overshadow Zookeeper, it probably will with the momentum behind it, but as an ops guys, I wasn't willing to bet production application service discovery on it.

My distaste for the Go community is pretty well-established in these parts; I think worse-is-better is screwing us all, and etcd seems to me to be the worse-is-better Zookeeper. And for things that don't matter, sure, worse-is-better your life away; a Rails app can be whatever you want, but the infrastructure I manage had better be bulletproof. I won't say etcd will never be competitive, but without some significant changes, I don't see it getting my vote--and those changes are largely around the parts of the feature set that etcd doesn't support, at which point...why use it, anyway?
What particular issues have you run into with etcd and/or CoreOS?
Lots of split brains. Serious bugs making it through the alpha and beta channels into stable (and our boxes auto-updating only to become useless). Fleet units dying purely due to problems with fleetd/systemd. A particularly painful one was an Akka deployment on top of CoreOS where a sidekick unit would fail to start because fleet hadn't actually copied the unit file to the remote host. Only happened with sidekicks but due to how we ran our networking, it effectively killed the application. Almost every redeploy required manually getting fleet to copy the unit over.
Just to add on: I've had fleet misreport unit status and btrfs reporting lack of disk space for no apparent reason. Also the inability to restart individual failed units which are part of a global unit.

Also there was that one time they changed how cloud-config was parsed, so if "#cloud-config" wasn't on the very first line without preceeding spaces, initialisation would fail. That was when I switched the reboot strategy to manual.

Matches up pretty well with my experience, too. I do not trust fleet as far as I can throw it.
> Between Exhibitor and Curator, I honestly find Zookeeper so straightforward and easy to work with that I don't quite understand the popularity of etcd.

Tools like etcd and consul fit into the Unix philosophy of small, composable tools. Zookeeper is more a part of the Enterprise Java philosophy which many people have written off for various reasons, both rational and irrational.

Having run Zookeeper in the past and now having run Consul in production for the past ~6 months, I can't imagine ever running Zookeeper again, unless I'm using a tool that's built on top of it. Consul is just easier to use/maintain and we've yet to run into any problem with it. Zero problems in six months. In all the time I ran Zookeeper, I could never say that.

Wait a tick, how does consul fit into the "Unix philosophy" (which is asinine, bullshit, and wrongheaded, in whatever order you please, but set that aside) but Zookeeper doesn't? Your description of running Consul problem-free but being stricken with issues with Zookeeper is foreign to me, but okay, anecdotes, but this is bonkers, man. Consul being a DNS server and a K/V store and a health checker is so not-Unix-philosophy it should hurt you physically to say that.

I mean, Consul is fine for what it does. I've used both, whatever. But if you're going to have something that does a bunch of things, I'd much rather have the one that supports the primitives to do what somebody needs, rather than trying to do it all itself.

(Personally, after trying to work with Terraform, I don't much trust Hashicorp's attempts to write code I have to rely on to work correctly and never ever break. YMMV, of course.)

> how does consul fit into the "Unix philosophy"

- Single binary executable

- Compatible with ps (Zookeeper has the traditional java problem of showing up as .../bin/java followed by 4 lines of classpath)

- Arguments to consul don't need to be prefaced with -D (another common java problem)

- Passing -h to consul actually helps you figure out how to run it.

Oh, and the download is 1/3 the size of Zookeeper, and the executable includes the Go runtime whereas Zookeeper's java runtime is separate.

Etcd is actually much closer to the Unix philosophy. Consul seems to go more in the direction of similar Go tools like Docker where it bundles related activities together into one executable. But, then again, parts of the Unix ecosystem do this to (openssl, for one).

You could run it on top of something like CoreOS, but then you're just swapping Zookeeper for etcd.