I've been hearing a lot of people talk about Elasticsearch lately. I get the same gut feeling I was getting about MongoDB back during the "Webscale" days.
I am interested to hear a bit more about this, as I find it hard to believe. I have only ran it at pretty small scale - x8 servers, around 300 million documents indexed a day, peak index rate 30k docs/sec. I found that you have to monitor it correctly, tune the JVM slightly (Mostly GC), give it fast disks, lots of ram, and the correct architecture (search, index & data nodes) to get the most out of it. Once I did that it was one of the most reliable components of my infrastructure, and still is. I would recommend chatting to people on the elasticsearch irc, or mailinglist, everyone was a great help to me there.
The full explanation deserves a blog post, but in a nutshell it revolves around the issue that ES contains a huge amount of complexity around a feature that is actually fairly useless (the "elastic" part) or at least difficult to use correctly. I've found that you need to be a deep expert in ES to architect and run it properly (or have access to such expertise) and even then it requires regular care and feeding to maintain uptime. In a short-deadline startup world you probably won't have time for any of that--once it's working it will lull you into a false sense of security and then completely blow up a few weeks/months later.
Really? Perhaps I was never running it at a large enough scale, but even pre-v1.0 I've basically never had any troubles with it (outside of operation concerns like occasionally confusing query syntax.) Then again, I never had more than 11 servers in the cluster so again I may just have never run into problems at scale.
While I don't necessarily disagree, I do find that this depends entirely on how ES is used. All too often people dive headfirst into using elastic search in ways it really should not be used.
I've only heard of very few cases where people were using ES as primary storage, and even there they acknowledged that they were probably crazy for doing so.
Yeah, I had an argument with that over at reddit. Where someone advocated ES as an alternative to Cassandra. >___<. I did hopefully, convinced the user otherwise.
Elasticsearch is just a text search engine base on lucene. You either use ES, Solr, or Lucene library if you want fuzzy search and such.
You really want to use it in tandem with a storage db PostgreSQL, Cassandra, MongDB. Where ES or any lucene based indexer/db would be use for text searching.
I personally like PostgreSQL and Cassandra, would use it in tadem with ES. Solr, last I check was a bit complicated to cluster.