Hacker News new | ask | show | jobs
by toomuchtodo 3646 days ago
Qualified sysadmin/devops here: You can run a small Elasticsearch cluster (<9 nodes) without a dedicated ops person if necessary. Run with more than 3 nodes, ensure you have a proper number of index shards/replicas, and ALWAYS use an odd number of nodes.

Some overprovisioning will be required, but with the extra infra spend you're delaying the need for a dedicated role to manage it.

2 comments

Software I have to 3x over provision to keep alive is crappy software.

ElasticSearch is by far one of the most obnoxious software programs I ever have the misfortune to administer. I now avoid self hosting it at all costs.

You don't "3x overprovision" Elasticsearch in particular, you should always provision a minimum of 3 nodes for any highly-available clustered DB. 1 node may fail, you're down. With 2 nodes you can develop a partition or lose a node and neither node can elect itself master, you're down. With 3 nodes, you can lose a node or develop a partition, the other 2 nodes can reach a quorum on a new master and continue operating.
This is how it should work, in theory. However, in practice, with ES whenever a single node went down the whole cluster would fail. Trying to add a new member to the cluster never worked, nor did trying to recover the failed node, hence the cluster-swap.

It doesn't help that their logging (at least pre v2) is incredibly dense.

ES clusters will fail on the loss of a single node if you aren't running any replicas on your shards, but that's not really ES's fault. I've occasionally had a node just wig out and need to be restarted, but that's like a once-a-year thing, and I'm working on a ~TB cluster that processes a ridiculous number of writes - this isn't an underworked cluster by any means. As long as your cluster discovery mechanism is set up properly, adding and removing nodes from the cluster is about as easy as it gets. I'm certainly not saying that your experience wasn't valid, but my own experience with it has been that it's remarkably easy to manage.

ES was pretty brittle in the pre-1.x days, but from 1.0 onward it's quite easy to work with. The logging is dense, but that's because it's thorough - a feature I really quite appreciate.

> and ALWAYS use an odd number of nodes.

Can you explain this one in more detail from a technical point of view?

You want an uneven number of master eligible nodes, so you can form a proper quorum (two out of three, three out of five). The total number of nodes doesn't actually matter. Smaller installation often make the same node master eligible and data holding at the same time, so in that case you'd prefer an uneven number of nodes. Once you move to a setup with dedicated master eligible nodes, you're freed from that restriction. You could also run 4 nodes with a quorum of 3, but that will make the cluster unavailable if any two nodes die. The worst setup is 2 nodes since you can only safely run with a quorum of 2, so if a node dies you're unavailable.
Thank you for explaining this, very helpful.