Hacker News new | ask | show | jobs
Elasticsearch vs. Google
8 points by donboscow 1568 days ago
Hi, I am looking at Elasticsearch currently. Looks like a great search engine tool. I was wondering - what are its performance metrics relative to established players like Google? Like it uses Okapi BM25 algorithm, how does that measure up against Google's PageRank? Is the quality and relevance of search results it offers at par with what Google offers? And secondly, what about the performance? Google handles 5 billion queries per day, with natural peak load scenarios like sports events, or other bid incidents like elections. is Elasticsearch capable of scaling to this level, with the right combination of sharding, distribution and indexing?
4 comments

You need to do a lot more learning about this subject.

BM25 scores documents based on their contents.

PageRank scores documents based on their sources.

Very different.

If you are starting now, you start with Elasticsearch, because you can't start with PageRank and all the thousands of other things that make up what you think of as Google Search.

If you have years, people and billions of dollars, you can start building towards google-like capabilities. Elasticsearch allows you to add capabilities like google provides, but you still have to build them.

Build your first round of search on Elasticsearch, and you'll likely learn enough to start understanding the difference between the two.

I am aware that I need more learning, no need for that advice. I also know that BM25 and PageRank are different, were they the same, there wouldn't have been a need for two of them, my question was ver clear - which one in the long run yields better (in terms of relevance) search results comparatively. I am also aware that ES is a starter pack and Google is a beast that will need much much more than just an ES stack, what i wanted to know was - whether ES offers the capabilities to fine tune it to make it behave like such a beast. Thank you for your advice, but it is better if you read the question first before lecturing others in a condescending manner.
Theoretically ES can scale that big, but isn't easy and takes a large team to manage. Just like at Google.

ES is a toolbox not something you can just use off the shelf. I think Elastic likes to make people think it is a off the shelf solution, but the defaults aren't great for many use cases. However ES can include pagerank, ML, BM25, etc in the ranking calculation but it requires search relevance expertise to make it all work together for any particular use case. And different use cases will need different ranking equations.

Yes of course. What I meant is - when I query a phrase, that phrase can be found in one million webpages, yet I get a bunch of them sorted by relevance. Surely that is a combination of two things - deep rooted crawling that gathers data from most websites, and secondly, a nice algorithm to sort them by relevance that is based on a variety of signals. ES has nothing to do with crawling, that is custom to the user using ES, but for the content fed to ES, how much does it allow customizing signals, combining them into a custom relevance logic, and how much does it allow to modify and edit the indexing logic so that say I can use a combination of BM25 and PageRank?
It is very customizable, but signals like PageRank are best calculated outside ES and included as a field in your document.
This is a weird question because they're so different. Google handles an incredible number of queries, has specialized code for detecting and handling different search intents, metrics for result quality, and AI systems to improve result quality. Popular queries and unpopular ones likely go through completely different code paths, and recent data likely has separate code path. In addition, Google has a crawler side that you have to remember. Pagerank is also less of a thing than it used to be.

Elasticsearch is a sharded search index.

It's a little like you're trying to compare a bicycle (or for this analogy, even a fire extinguisher--these things are just that different) and a firetruck. Sure, there are some similarities and some overlap, but a firetruck is massive and very purpose-built.

This seems like an 'Ask HN'. You might want to tag it that way.