| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rafekett 4282 days ago
	just because the dataset is sharded doesn't mean that one query has to hit every shard. for example, suppose you're looking for documents with `parent_id = foo` and your sharding key is `parent_id`, then an intelligent query planner would only query one shard (the one that "foo" hashes to), and then this looks a lot like a join in an RDBMS. indeed, if you wanted to do (in RDBMS terms) a self-join to load the whole tree of documents rooted at parent_id = foo, and your sharding key were the root for each document, that query would only hit one shard with a. the trick is deciding which keys to shard on (and, in many cases, what other keys to shard on in redundant datastores that serve different types of queries).

1 comments

Right, you were quicker but are essentially saying the same thing as I said in my example.