Hacker News new | ask | show | jobs
by elssar 3644 days ago
In terms of AWS costs, running your own cluster will cost significantly less, specially if you use reserved instances. The AWS ES instances cost ~1.6x more than their regular counterparts. Also, you can't reserve ES instances.
1 comments

That's if you have a qualified sysadmin on-staff to manage it. We tried managing our own ES setup for a little over a year before giving up. Despite our simple needs and low traffic volume we regularly had difficult-to-diagnose issues with memory usage, node interconnectivity, and performance. With 3 EC2 nodes in a cluster we would occasionally have the cluster health go to yellow and remain in a 'recovering' sate indefinitely, bringing performance to a halt, and we'd essentially have to rebuild and swap the cluster.

Granted, none of us were proper ES admins, but we had a lot of experience working with system administration and specifically database performance and clustering. Despite that we were definitely in over our heads with ES.

Qualified sysadmin/devops here: You can run a small Elasticsearch cluster (<9 nodes) without a dedicated ops person if necessary. Run with more than 3 nodes, ensure you have a proper number of index shards/replicas, and ALWAYS use an odd number of nodes.

Some overprovisioning will be required, but with the extra infra spend you're delaying the need for a dedicated role to manage it.

Software I have to 3x over provision to keep alive is crappy software.

ElasticSearch is by far one of the most obnoxious software programs I ever have the misfortune to administer. I now avoid self hosting it at all costs.

You don't "3x overprovision" Elasticsearch in particular, you should always provision a minimum of 3 nodes for any highly-available clustered DB. 1 node may fail, you're down. With 2 nodes you can develop a partition or lose a node and neither node can elect itself master, you're down. With 3 nodes, you can lose a node or develop a partition, the other 2 nodes can reach a quorum on a new master and continue operating.
This is how it should work, in theory. However, in practice, with ES whenever a single node went down the whole cluster would fail. Trying to add a new member to the cluster never worked, nor did trying to recover the failed node, hence the cluster-swap.

It doesn't help that their logging (at least pre v2) is incredibly dense.

ES clusters will fail on the loss of a single node if you aren't running any replicas on your shards, but that's not really ES's fault. I've occasionally had a node just wig out and need to be restarted, but that's like a once-a-year thing, and I'm working on a ~TB cluster that processes a ridiculous number of writes - this isn't an underworked cluster by any means. As long as your cluster discovery mechanism is set up properly, adding and removing nodes from the cluster is about as easy as it gets. I'm certainly not saying that your experience wasn't valid, but my own experience with it has been that it's remarkably easy to manage.

ES was pretty brittle in the pre-1.x days, but from 1.0 onward it's quite easy to work with. The logging is dense, but that's because it's thorough - a feature I really quite appreciate.

> and ALWAYS use an odd number of nodes.

Can you explain this one in more detail from a technical point of view?

You want an uneven number of master eligible nodes, so you can form a proper quorum (two out of three, three out of five). The total number of nodes doesn't actually matter. Smaller installation often make the same node master eligible and data holding at the same time, so in that case you'd prefer an uneven number of nodes. Once you move to a setup with dedicated master eligible nodes, you're freed from that restriction. You could also run 4 nodes with a quorum of 3, but that will make the cluster unavailable if any two nodes die. The worst setup is 2 nodes since you can only safely run with a quorum of 2, so if a node dies you're unavailable.
Thank you for explaining this, very helpful.
Thats why I said, "in terms of AWS costs".

I've been managing out elasticsearch cluster for the past year and a bit. It grew from a single node that also ran kibana, logstash, and nginx, and stored our mysql backups to 9 data nodes, 1 client node, and a dedicated master. I have faced issues, but never had to rebuild the cluster. For the most part, reading up on the ES docs, and making a config change fixed the issue. Sometimes I've had to restart a node, but thats rare.