| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by RyanZAG 4515 days ago
	Elasticsearch is really awesome for searching, but what most people don't realize is that it makes a better MongoDB than MongoDB while giving you that searching too.

9 comments

bilbo0s 4515 days ago

This. A THOUSAND TIMES "This".

The one drawback ES had in the bad old days was that backup and restore was a nightmare... ESPECIALLY on AWS. The new system they introduced was so simple I was concerned about updating to it because I was SURE something would go south.

But it all just worked.

I still have the Couch to ES replication running because I'm anal like that... but really... yeah... you can do without Couchbase, Mongo et al... ES will probably do everything you need PLUS everything you can't do in the others.

diminish 4515 days ago

As a proud user of Elastic search since the early days I'm happy to see so much progress. Never mind about the *search part of their naming it's really a database for all practical purposes, especially for web data.

rjzzleep 4514 days ago

to be fair, the main selling point of mongodb is that developers can access it more easily. i haven't really touched mongodb in over a year and then only for playing, but have you tried the elasticsearch filter query syntax? have you compared mongodbs syntax?

also, i have the exact opposite nitpick. people want to use it to do everything, mail indexers, file system indexers. what's the matter with web developer folks? why is it that when the next database comes around they want to use it for everything?

bilbo0s 4514 days ago

"....why is it that when the next database comes around they want to use it for everything?...."

Because they like a simple web stack. KISS means a faster time to market. Faster time to iterate. Faster time to fix bugs because there are fewer places those bugs can be. All of that doesn't even factor in the productivity benefits gained by not having to switch technologies from project to project.

But to be fair, ES is not some brand new database... ES has been around for a LONG time.

rpedela 4514 days ago

Apache Lucene has been around awhile. ES has been around since 2010.

bilbo0s 4514 days ago

Yeah...

that's a pretty long time.

AznHisoka 4515 days ago

Just curious, if I'm using say version 0.92, how would I go about backing up my ElasticSearch instance. Besides creating a replica in a server, then "freezing" it by disconnecting the server?

polyfractal 4515 days ago

The pre-Snapshot/restore method is:

- Pause indexing

- Issue a flush request

- Rsync data directories somewhere

- Resume indexing

This is technically a very naive approach, since a simple rsync of the data dirs will include replicas too. If you were more diligent you could check the state files in each shard directory and only copy out the primaries.

bilbo0s 4515 days ago

Polyfractal is right.

You can just google "elasticsearch rsync" to get information, and even scripts, that will do this for you. The thing is... you REALLY need to know what you're doing when you go this route.

Also, you can try the gateway feature. Gateway is actually pretty straightforward. Restore WILL be slow though. And for many scenarios ... it is not ideal. (You don't want to take a day, or even a few, to restore after a failure.)

I think the best advice is...

Update to 1.0.

Just go to 1.0 and do snapshots... you will save yourself A LOT of headaches.

kainosnoema 4515 days ago

I'm surprised so many people miss this. Out of the box, Elasticsearch is a distributed NoSQL store with better write consistency (and arguably performance) than MongoDB offers in its default configuration. The major missing feature was backup snapshots and restores, which 1.0 delivers—along with aggregations that more than rival MongoDBs. The team has intentionally avoided marketing themselves as a NoSQL store (was told this directly by an employee), but they're aware of the potential and have customers using it as such.

nkoren 4515 days ago

It's easy to miss. On the front page, the word "store" only occurs once, buried three page-scrolls down in the body text. Otherwise it very much gives the impression of being some kind of analytics dashboard for third-party datastores. And I didn't notice that until after I've visited the website, clicked through a few links trying to figure out what the fuss was about, then gave up and decided to read the comments here.

Argorak 4515 days ago

Probably because some store features have been missing up to 1.0, like backup/restore without knowing database internals. (yes, rsync did the job, but only because you knew the list of guarantees that makes it possible).

Also, Lucene at its core is an Index. Changing the query strategy might require reindexing. It is perfectly valid to throw data at it, build the index and throw away the source. You will just never get it back again.

While ES can be used and tuned as a store just fine, it is not necessarily its raison d'etre.

gibrown 4515 days ago

While I agree with the sentiment, I think Shay (lead ES developer) has explicitly said that he does not consider ES to be a data store... yet. I think this is mostly due to maturity.

I help run a large ES cluster (with canonical data in MySQL), and I consider this cautious attitude by the ES developers to be a good thing.

spooneybarger 4515 days ago

He has indeed said that. We hosted the Elasticsearch meetup in NYC a couple weeks ago and specifically said it.

camus2 4515 days ago

did not know all that stuff, could Elasticsearch be the holy grail of document stores ?

room271 4515 days ago

No. The choice of datastore is still incredibly complicated in the distributed world; it's all about tradeoffs really.

For example, Elasticsearch has poor availability characteristics - both because it is master-slave and because it focuses on ensuring consistency - relative to, for example, something like Riak.

kainosnoema 4512 days ago

I don't believe it's "master-slave" in the way you're thinking. Elasticsearch shards its indexes among all available nodes, storing replicas of each shard on separate nodes when possible. This ensures that the entire cluster is available as long as at least one replica of a shard is still online. In fact, if configured properly, it has better availability than consistency since by default it only flushes its oplog to the Lucene index segments every second (though writes aren't considered committed until they reach a quorum of nodes, so consistency is fairly good in practice as well).

tracker1 4515 days ago

It is definitely a nice, and flexible option.. it truly depends on what your needs are... If you're often updating parts of a document, MongoDB or RethinkDB may be better options. If you want integration where a lot of parts are SQL with some document ability, PostgreSQL + V8 is pretty compelling. Also, something like Cassandra may suit your needs better if you want a better and more predictable growth curve.

There's no holy grail of data storage... ElasticSearch is really nice, and if it fits your needs, more power to you.

rpedela 4513 days ago

We'll maybe some day but it is still too easy to corrupt the data or index. Recently I had a problem where the data itself was fine and searches worked correctly but it was 100x slower than it should be. It just started happening for no apparent reason and I just do basic searches on typical data. I still don't know what happened but creating a new index fixed the problem.

sandGorgon 4515 days ago

I had a live production logistics system running on top of Elasticsearch 0.6 (as a NoSQL database ) back in 2012. This powered one of India's largest ecommerce systems (at that time).

Elasticsearch is brilliant as a NoSQL - and if you were already using elasticsearch as a search system, you dont need to introduce yet another component into your stack.

axefrog 4515 days ago

What limitations should one be aware of that would make ElasticSearch not a viable candidate where something like MongoDB would be a better fit?

RyanZAG 4515 days ago

When running a search, ES by default will not show items that have been indexed in the last 1 second. Directly getting an item by its ID doesn't have that limit though, and you can optionally set a search to force a re-index and show all items.

Other than that (which is just performance tuning, really), ES matches mongodb feature for feature, and obviously has a lot of extra power from its search heritage such as facets and percolate.

So I can't actually think of any limitations, and it's why I said ES makes a better MongoDB than MongoDB.

alisson 4515 days ago

On ElasticSearch you have to update the whole document, no commands to manipulate them. You don't have commands like: $set, $addToSet, $pop, etc..

You need to have a good understanding of how tokenizers and analyzers work to be able to create good results for your data. I have difficulties matching documents with the exact title being searched for. On MongoDB that just works, on ElasticSearch you need to configure it.

ElasticSearch has some advantages and MongoDB others. I think they are great together. One for storage and the other for searching.

polyfractal 4515 days ago

Regarding updates, you can use the Update API for partial updates, and include a script to do things like "counter += 1" or "add value to existing array".

Internally it is still reindexing the entire document, but from your application's perspective, the Update API is a lot friendlier.

http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

alisson 4515 days ago

Thanks for pointing that out, it will be really useful!

xtracto 4515 days ago

>You need to have a good understanding of how tokenizers and analyzers work to be able to create good results for your data.

This is really important. Creating a proper searching experience with auto-complete which works "just like you want" can be a very painful experience with ES, specially if you are new to ES. It bite me some time ago when I was trying to achieve just that.

hkon 4514 days ago

Care to elaborate? What were the steps you had to go through?

scorpion032 4512 days ago

If for storage of data, I'd use and only use a RDBMS like Postgres. Not Mongo.

brasetvik 4515 days ago

I can't comment much on MongoDB, but I've written a bit things to keep in mind when considering Elasticsearch as a NoSQL store here: https://www.found.no/foundation/elasticsearch-as-nosql/

curun1r 4515 days ago

An interesting read, but I'd disagree with your contention that NoSQL isn't about ACID. When NoSQL databases started coming out, it was really about which CAP guarantee a database chooses to compromise. Traditional SQL databases are either partition-intolerant or become unavailable (for writes) in the event of a partition. NoSQL databases compromise on consistency. If a database is claiming to be NoSQL and have ACID transactions, they've either disproven CAP or aren't part of the new group of distributed, partition-tolerant databases that people have been calling NoSQL. It's been said for a while that NoSQL is a terrible name for that group of technologies and now that we're getting databases with a non-SQL interface but also having consistency guarantees, the name is starting to cause even more confusion.

Side note: Happy Found customer here...you guys have made it much easier to run our ES index!

brasetvik 4515 days ago

Thanks for the feedback!

The point of that section is exactly that "NoSQL" (or to make things even more confusing "NOSQL" (Not only) doesn't have a very specific meaning. Some think it rules out ACID, other's don't. Thus, you'll need to know what you need.

And database marketing tend to not be very good at pointing out what they're not good at, or actually deliver what they promise. See also: http://aphyr.com/tags/jepsen

room271 4515 days ago

I'm not sure you have this right. CAP says nothing about ACID - it only mentions consistency.

NoSQL was in large part about precisely what the name implies - giving up relational (SQL) data in exchange for better performance and the ability to have a distibuted store. Yes, part of this is also about being willing to trade off consistency for availability. But Elasticsearch is an example of a NoSQL store which does focus on consistency (in this case at the expense of availability and, to some extent, partition tolerance).

sjs382 4515 days ago

I'm not sure if ElasticSearch does anything like this, but I make use of MongoDB's GeoJSON queries, namely the $geoIntersects operator.

http://docs.mongodb.org/manual/applications/geospatial-index...

sjs382 4515 days ago

Wow, it looks like they do... http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

polyfractal 4515 days ago

In addition to the various geo filters/queries, there are also two aggregations for geo related stuff:

Geohash Grid: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

Geodistance: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

morganherlocker 4515 days ago

Might not matter, but they do not follow the geojson spec for spatial storage.

Argorak 4514 days ago

Sure, ES supports lat/lon as properties, strings, geohash and geojson:

http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

morganherlocker 4514 days ago

I could be totally wrong, but the docs you linked to do not actually conform to the geojson spec. It is geographic and it is json, but not valid geojson. The part where it says:

> Format in [lon, lat], note, the order of lon/lat here in order to conform with GeoJSON.

.. the data example below is not actually geojson. See the spec:

http://geojson.org/geojson-spec.html

abhirama 4514 days ago

When I played around it, could not figure out a way to get the exact count of events in the datastore when the data was distributed in replicas. In fact, there was ticket open for this, not able to fish it out now.

ddorian43 4515 days ago

presharding

You create a number of shards for each index(database) that you can't later expand.

RyanZAG 4515 days ago

Is this still a limitation? I haven't run into any use cases where this has been a problem yet. Since the default shards are 10 and 2 replicas, does that not mean each index should be able to scale up to 20 servers? I'd think that if your data grew enough that 1/10th does not fit on a server, you could do a one time maintenance and rebuild all your servers.

I have my doubts mongodb would scale up that well to 20+ servers without some maintenance as well. So I'm not sure how that's really a limitation anyone should use for choosing mongodb or ES. If you're expecting that kind of data, just make a large number of shards in your index creation as it will work fine on fewer servers too?

ddorian43 4515 days ago

you can grow a little larger than that by using some nodes only for aggregating/handling queries(holding no data/shards)

larger number of shards=slower searching (unless you distribute the shards to multiple nodes)

AznHisoka 4515 days ago

What I've done, and I'm not totally sure if it's a best practice is I've over-allocated the # of shards. So if I think I need 5 shards, I create 50 or 100 shards instead. Then I'll have some app logic to determine the shard a document should go to. Initially all docs will go to shard 0. Then when that's full (around 15 GB of size, depends on your RAM), then I set all docs to go to shard 1. Of course, you'll need to be careful as you dun want duplicate documents in different shards.

The benefit of this is the as your app scales, you'll search only the shards needed. So if you have just 1 shard w/ data, u can tell ElasticSearch to just search in that 1 shard.

aquadrop 4515 days ago

So, what happens when you fill up the last shard?

ddorian43 4515 days ago

look: routing_field

ddorian43 4515 days ago

also changing indexed-fields on the go

mtrn 4515 days ago

True. I evaluated Mongo, Couch and a couple of similar solutions, but ES being a search engine from the start really convinced me, that it can be a viable database for loosely structured data.

g9yuayon 4515 days ago

I don't know much about MongoDB, but it's true that Elasticsearch is a great NoSQL db with support of boolean search. Netflix has a number of use cases that use Elasticsearch as such NoSQL db: http://www.slideshare.net/g9yuayon/elasticsearch-in-netflix

ErrantX 4514 days ago

Definitely! We are using it in production for storing monitoring data (via sensu, if anyone is interested). It's fantastic because you can shove data into the index with a ttl of 1 year. And have a x month archival strategy for cold storage.

It's search capabilities and scalability and fantastic - were throwing GB of data into it weekly and it just soaks it up.

tracker1 4515 days ago

I would suggest that everyone who is considering one, look at both... When I looked into both, about a year and a half ago, I found that geospacial searches worked better in MongoDB at the time, and shaping my data to fit was more awkward with ElasticSearch.

That said, it's definitely worth looking into both, depending on what your needs are.

obastemur 4514 days ago

"most people don't realize is that it makes a better MongoDB than MongoDB "

(IMHO) Unfortunately for most of the people, old habits to be made. Indeed a nice project and great release.