Hacker News new | ask | show | jobs
by Beanis 3221 days ago
What distinction is being made between sharding by entity and sharding a graph? The approach seems to be the same, just with different naming.
1 comments

Yeah, that section doesn't seem to be making much sense. I don't think there are actually any real examples of "graph sharding" in the wild. The graph databases that are available, like Neo4j, don't usually natively provide horizontal partitioning. (Of course -- the problem of finding the minimum k-cut of a graph is itself NP-complete. Doing this incrementally with a dynamic graph is even harder.)

The one solution that was mentioned, Facebook's TAO, isn't actually really a database; it's a cache, which means that it doesn't really have to deal with sharding in the way that persistent stores do. And it doesn't really shard at all; it basically stores a complete copy of the world's social graph in every region, which it can just populate from that region's MySQL replicas. (It's amazing the things you can do when you can be eventually consistent.)

(From what I recall, the main social network's MySQL also isn't really sharded by graph in any fancy way; it's basically "just" hash-sharded by entity ID.)

dgraph does try to provide horizontal scaling out of the box. The sharding is done by predicate - cf https://docs.dgraph.io/deploy/#multiple-instances for a documentation link ; I am not sure how it behaves for very frequent predicates though