Hacker News new | ask | show | jobs
by kainosnoema 4515 days ago
I'm surprised so many people miss this. Out of the box, Elasticsearch is a distributed NoSQL store with better write consistency (and arguably performance) than MongoDB offers in its default configuration. The major missing feature was backup snapshots and restores, which 1.0 delivers—along with aggregations that more than rival MongoDBs. The team has intentionally avoided marketing themselves as a NoSQL store (was told this directly by an employee), but they're aware of the potential and have customers using it as such.
3 comments

It's easy to miss. On the front page, the word "store" only occurs once, buried three page-scrolls down in the body text. Otherwise it very much gives the impression of being some kind of analytics dashboard for third-party datastores. And I didn't notice that until after I've visited the website, clicked through a few links trying to figure out what the fuss was about, then gave up and decided to read the comments here.
Probably because some store features have been missing up to 1.0, like backup/restore without knowing database internals. (yes, rsync did the job, but only because you knew the list of guarantees that makes it possible).

Also, Lucene at its core is an Index. Changing the query strategy might require reindexing. It is perfectly valid to throw data at it, build the index and throw away the source. You will just never get it back again.

While ES can be used and tuned as a store just fine, it is not necessarily its raison d'etre.

While I agree with the sentiment, I think Shay (lead ES developer) has explicitly said that he does not consider ES to be a data store... yet. I think this is mostly due to maturity.

I help run a large ES cluster (with canonical data in MySQL), and I consider this cautious attitude by the ES developers to be a good thing.

He has indeed said that. We hosted the Elasticsearch meetup in NYC a couple weeks ago and specifically said it.
did not know all that stuff, could Elasticsearch be the holy grail of document stores ?
No. The choice of datastore is still incredibly complicated in the distributed world; it's all about tradeoffs really.

For example, Elasticsearch has poor availability characteristics - both because it is master-slave and because it focuses on ensuring consistency - relative to, for example, something like Riak.

I don't believe it's "master-slave" in the way you're thinking. Elasticsearch shards its indexes among all available nodes, storing replicas of each shard on separate nodes when possible. This ensures that the entire cluster is available as long as at least one replica of a shard is still online. In fact, if configured properly, it has better availability than consistency since by default it only flushes its oplog to the Lucene index segments every second (though writes aren't considered committed until they reach a quorum of nodes, so consistency is fairly good in practice as well).
It is definitely a nice, and flexible option.. it truly depends on what your needs are... If you're often updating parts of a document, MongoDB or RethinkDB may be better options. If you want integration where a lot of parts are SQL with some document ability, PostgreSQL + V8 is pretty compelling. Also, something like Cassandra may suit your needs better if you want a better and more predictable growth curve.

There's no holy grail of data storage... ElasticSearch is really nice, and if it fits your needs, more power to you.

We'll maybe some day but it is still too easy to corrupt the data or index. Recently I had a problem where the data itself was fine and searches worked correctly but it was 100x slower than it should be. It just started happening for no apparent reason and I just do basic searches on typical data. I still don't know what happened but creating a new index fixed the problem.