Hacker News new | ask | show | jobs
by rdtsc 4071 days ago
Mandatory reading -- Last year's Call Me Maybe : Elasticsearch

https://aphyr.com/posts/317-call-me-maybe-elasticsearch

I've been hearing a lot of people talk about Elasticsearch lately. I get the same gut feeling I was getting about MongoDB back during the "Webscale" days.

3 comments

In my experience, Elasticsearch is the single most common source of infrastructure downtime and service failure. It's basically my arch nemesis.
I am interested to hear a bit more about this, as I find it hard to believe. I have only ran it at pretty small scale - x8 servers, around 300 million documents indexed a day, peak index rate 30k docs/sec. I found that you have to monitor it correctly, tune the JVM slightly (Mostly GC), give it fast disks, lots of ram, and the correct architecture (search, index & data nodes) to get the most out of it. Once I did that it was one of the most reliable components of my infrastructure, and still is. I would recommend chatting to people on the elasticsearch irc, or mailinglist, everyone was a great help to me there.
The full explanation deserves a blog post, but in a nutshell it revolves around the issue that ES contains a huge amount of complexity around a feature that is actually fairly useless (the "elastic" part) or at least difficult to use correctly. I've found that you need to be a deep expert in ES to architect and run it properly (or have access to such expertise) and even then it requires regular care and feeding to maintain uptime. In a short-deadline startup world you probably won't have time for any of that--once it's working it will lull you into a false sense of security and then completely blow up a few weeks/months later.
Same here. A single node failure has lead to the whole cluster crashing down around me on more than one occasion.
Really? Perhaps I was never running it at a large enough scale, but even pre-v1.0 I've basically never had any troubles with it (outside of operation concerns like occasionally confusing query syntax.) Then again, I never had more than 11 servers in the cluster so again I may just have never run into problems at scale.
While I don't necessarily disagree, I do find that this depends entirely on how ES is used. All too often people dive headfirst into using elastic search in ways it really should not be used.
It can't be worse than RabbitMQ... can it?
I use ES only for search (indexes from a DB), so losing data isn't a massive drama, it's great for my usecase.
That sounds like the indended use. I should qualify my comment, I heard it advocated for a primary data storage.
I've only heard of very few cases where people were using ES as primary storage, and even there they acknowledged that they were probably crazy for doing so.
Yeah, I had an argument with that over at reddit. Where someone advocated ES as an alternative to Cassandra. >___<. I did hopefully, convinced the user otherwise.
Elasticsearch is just a text search engine base on lucene. You either use ES, Solr, or Lucene library if you want fuzzy search and such.

You really want to use it in tandem with a storage db PostgreSQL, Cassandra, MongDB. Where ES or any lucene based indexer/db would be use for text searching.

I personally like PostgreSQL and Cassandra, would use it in tadem with ES. Solr, last I check was a bit complicated to cluster.

Agreed. Cassandra is especially nice if you have the DataStax Enterprise version which allows for seamless integration between the two.
> Solr, last I check was a bit complicated to cluster

SolrCloud, with Zookeeper, is relatively new and not too difficult to set up.

Does it still have the issue where you have to take the cluster down to create a new index or modify existing ones?
No, search for MergeIndexes or --go-live.
What about storing data for analytics? Wouldn't it be better to use ES than Postgres for that?