Hacker News new | ask | show | jobs
by jordanthoms 965 days ago
Sharding is anything but simple. A single shard per region wouldn't have enough write capacity so they'd have to be managing likely 100+ shards in each region - you'd have to build a lot of infrastructure to automate setting those up, rebalancing traffic to avoid hot spots and underutilized shards, in sync with schema migrations etc.

Even after that, now your applications using the DB have to be aware of the sharding - interactions between users who are housed on different shards etc could require a lot of work at the application layer. If your customers can be easily be split into tenants which never interact with each other this isn't so bad but for a consumer app like DoorDash there isn't clear tenant boundaries.

We looked at all this for Kami and realised that it would be much easier for us to move from PostgreSQL to CockroachDB (we had exceeded the write capacity of a single PostgreSQL primary) than to shard Postgres, and it'd make future development much faster. We could have made sharding work if we had to... but it's not 2013 any more and we have distributed SQL databases, why not use them?

1 comments

That’s surprising — the education market seems like an even better fit for sharding: students and teachers generally stay within the context of a single school.
It does seem like there would be a clean boundary between each school district, but actually there's plenty of sharing and collaboration on Kami that happens with users between districts, teachers and students move schools, parents can have children in different school districts, etc. Even a single classroom assignment can cross those, e.g. when someone external comes in to do a training session.

We could have modified our application layer to handle those cases, but it's a lot of extra complexity and room for error, and we'd have had to consider and solve for all of these cross-tenant situations as we add new functionality, so I was really keen to avoid that.

Also, there are some really big districts - NYCDOE has >1.1 million students and 1,800 schools. Even with them on a dedicated shard, it's quite possible that it'd get overloaded and we'd be spending more dev effort figuring out how to safely split them onto multiple shards.

When we looked at using distributed SQL database instead it was a clear win - from the application's perspective, it just looks like a really, really big PostgreSQL box, so we didn't need to change much. (the SQL support is very close to PG - The most annoying thing for us was the lack of trigram indexes, and Cockroach has now added those now). And in terms of the operational side, upgrading and maintaining CRDB has actually been easier than PG - version upgrades are easier to do without downtime, and schema migrations don't lock tables.