Hacker News new | ask | show | jobs
by evanelias 816 days ago
> That said, rolling your own sharding is a MASSIVE undertaking.

It's a large challenge, but it's absolutely doable. A ton of companies did this 10-15 years ago, basically every successful social network, user generated content site, many e-commerce sites, massively multiplayer games, etc. Today's pre-baked solutions didn't exist then, so we all just rolled our own, typically on MySQL back then.

With DIY, the key thing is to sidestep any need for cross-shard joins. This is easier if you only use your relational DB for OLTP, and already have OLAP use-cases elsewhere.

Storing "association" type relation tables on 2 shards helps tremendously too: for example, if user A follows user B, you want to record this on both user A and user B's shards. This way you can do "list all IDs of everyone user A follows" as well as "list all IDs of users following user B" without crossing shard boundaries. Once you have the IDs, you have to do a multi-shard query to get the actual user rows, but that's a simple scatter-gather by ID and easy to parallelize.

Implementing shard splitting is hard, but again definitely doable. Or avoid it entirely by putting many smaller shards on each physical DB server -- then instead of splitting a big server, you can just move an entire shard to another server, which avoids any row-by-row operations.

Many other tricks like this. It's a lot of tribal knowledge scattered across database conference talks from a decade ago :)

1 comments

It's definitely doable. I was at Google circa 2006, pre Spanner, with sharded MySQL. Ads ran on top of it. It was a pain.

And yes, there are many tricks like having more logical shards than physical ones, collocating tables by the same shard_id, etc. It's still difficult. You need tooling for everything from shard splitting (even if that is just loving a logical shard), to schema migrations, not to mention if you end up needing cross-shard transactions or cross-shard joins.

Generally, you'd need a team of very strong infrastructure engineers. Most companies don't have the resources for that. There are definitely some engineers out there that could whip this all together.