Hacker News new | ask | show | jobs
by bashcoder 4442 days ago
I know that there are lots of heavyweight folks who swear by Zookeeper as both a reliable and powerful tool, and for good reason. Unfortunately the docs can be fairly inscrutable, even for experts, and it typically requires the maintenance of a separate cluster of Zookeeper nodes.

So I like that etcd is a fundamental component of CoreOS, with these features:

  1. Written from scratch in Go
  2. Implements the Raft protocol for node coordination
  3. Has a useful command line app
  4. Runs on every node in a cluster
  5. Enables auto-discovery
  6. Allows CoreOS nodes to share state
3 comments

No offense, I just don't get it: Why is 1) a feature for you? Everything else on the list kinda makes sense (I understand that this describes something I'd call a feature), but 'written from scratch' or 'in Go'?

Can you explain what excites you about that?

I listed what I like. "Excites" is your word.

I guess what I meant by "from scratch" is that they aren't burdened by legacy code, and aren't limited to using the Paxos algorithm.

If you look at Deis, for example, it basically outsources a lot of node management to Chef Server, which in my view creates a great deal of technical debt on day one.

You read negative connotations into 'excites'. That wasn't intended.

I was just curious, since 'from scratch' can just as well mean 'untested' although I certainly agree that it sometimes is the Right Thing. The reference to Go was another thing that threw me off, since I rarely (admittedly .. sometimes) judge software projects by the language it is written in.

Thank you for the answer and some more references.

> and aren't limited to using the Paxos algorithm

ZK uses the ZAB protocol, which is similar but not the same as Paxos.

https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs...

Probably the fact that you get a single binary (unlike interpreted languages), and that it isn't a mix of Go and C meaning I worries about c libraries.
Just to note, zookeeper is written in Java.
Yes, I'm aware. I was just answering the question, not to compare it to anything else.

Still, Java still requires another dependency..the JVM. Go binaries require...well nothing.

Personally, my preference for zookeeper comes from the API. To me, the ZK API and docs were far more understandable than the etcd ones. The etcd api docs appear to be a collection of examples, not reference docs. It does a poor job of explaining the possible operations and what various options will do, particularly in what combinations of options are allowed.

In fact, I had to resort to running test queries against a running etcd server just to work out the proper semantics of some of the arguments.

> 4. Runs on every node in a cluster

Is that so? When looking at it I distinctly remember it advising running a set of 3-9 nodes of etcd (not necessarily separate from other things).

The functionality to handle this is mentioned in the blog post: "standby" peer mode.

"Our upcoming release, etcd 0.4, adds a new feature, standby mode. It is an important first step in allowing etcd clusters to grow beyond just the peers that participate in consensus."

Fair enough, it just sounded like the GP was describing something inherent rather than something new, and it didn't mesh with my understanding of how etcd worked to date.
I don't know that that's necessarily a good idea.

As you (perhaps automatically) expand and collapse the cluster, you'll need to make sure to communicate to all nodes what the new cluster size is. If some nodes don't know the correct quorum count, split-brain!

Also, coordination services are typically critical, so its important to isolate from to the bugs in the adhoc code you're writing for your web tier, a crazy query in your database, etc.

It's much easier and safer in practice to just have 3 or 5 nodes running the coordination in isolation.

Edit: more reasons -- It's easier to deploy a coordination service to 5 nodes than 500. It's easier to debug 5 nodes than 500.

I probably should have said that it "can" run on any node. Yes, currently it does run on every node, but their roadmap doesn't have the requirement that every node be actively participating in elections.

I'm sure that you have seen fleets of dedicated Zookeeper nodes. I rather like that etcd is simply a service that can run on any node, and does not require a separate role-specific fleet of servers just to do coordination. That was the point I was attempting to make.