Say you have one collection for your people (sharded over 100 servers, say) and another one for conferences (also sharded over 100 machines). Then you could hold the primary keys of all conferences a person attended in a JSON list stored in an attribute with the user. A query finding all people with last name "Jones" that have attended a given conference can now be executed efficiently by using a secondary index on the last name of people and performing a key lookup in the conferences collection. The latter only has to talk to one shard, if the conferences collection is sharded by key and can thus be done efficiently. Obviously, one needs a query optimizer that is aware of the distribution of the shards and the shard keys, but this is certainly doable.
just because the dataset is sharded doesn't mean that one query has to hit every shard. for example, suppose you're looking for documents with `parent_id = foo` and your sharding key is `parent_id`, then an intelligent query planner would only query one shard (the one that "foo" hashes to), and then this looks a lot like a join in an RDBMS. indeed, if you wanted to do (in RDBMS terms) a self-join to load the whole tree of documents rooted at parent_id = foo, and your sharding key were the root for each document, that query would only hit one shard with a. the trick is deciding which keys to shard on (and, in many cases, what other keys to shard on in redundant datastores that serve different types of queries).
Thanks for both your answers, this is really interesting indeed. I always thought that joins are a "no, no, no" in the NoSQL world. This opens up a whole lot of new possibilities. I will have to have a look at this ArangoDB thing...