Hacker News new | ask | show | jobs
by hobarrera 4077 days ago
sharding is what confuses me the most. How would you avoid these race conditions with a distributed database?
3 comments

You often can't and that's one of the reasons the NoSQL movement gets a lot of jip from people who have been writing database applications for a long time. It's very easy to end up with a referentially inconsistent database when not using a "real" database. For some kinds of applications, notably the kinds that Google developed a lot of in its early days, you can often get away with minor database corruption. If a page drops out of the web search index for a bit until some batch job comes along and fixes things up, ok, no big deal. Nobody was promised they'd be in the index. If you have giant but basically flat tables of entities that don't reference each other, then something like BigTable is exactly what you need.

If you're trying to build a social network that's full of graphs and edges between them ...... good luck. Google developed technologies like MegaStore and Spanner to handle this. Before it had those, it used huge sharded MySQL instances.

Pick a good shard key. In the case of a per-page-per-user ratings system, if you shard on the PageId, then you can locally check consistency to make sure there's no duplicate (PageId, UserId) keys. You can check the same if you shard on UserId, but then doing aggregates can be more difficult since you need to talk to every shard to find out a page's rating.
You can either reconcile periodically, or you can make it not distributed in the way that matters.