Hacker News new | ask | show | jobs
by realfun 2895 days ago
Thanks for sharing EGreg.

Elasticsearch has geo-indexing as well(based on geohash internally), and by default it does id hashing similar to what you said(murmurhash3), we actually leverages that for location based searches.

The challenge addressed in the blog is not in how to search/address(as said Elasticsearch handles it already), it is about how to distribute the load so calculation only happens on limited nodes, and reduce the index size so it can be more performant.

1 comments

Ah, the goal makes sense. I would suggest that it’s not so bad to have a controller node fan-out and fan-in queries, as long as the database can handle many concurrent queries. Essentially you’re distributing the work evenly across nodes but you don’t have affinity for a particular node. Yes, there is more latency (it is as slow as the slowest connection) but it is endlessly scalable. But, I am sure I missed some benefits from localizing calculations to only a node or two.

In the scheme above, by the way, it DOES localize searches on one shard. Essentially all relations to a stream are on the same shard as the stream. And each center+radius has one associated stream and therefore the search takes place on one shard.