Hacker News new | ask | show | jobs
by rickreynoldssf 962 days ago
I'm not really seeing why DoorDash needs all their operational data in one monster clustered database. I would think its so much simpler to shard the data by region for operational queries and aggregate in the background for long-term storage.
8 comments

Sharding is anything but simple. A single shard per region wouldn't have enough write capacity so they'd have to be managing likely 100+ shards in each region - you'd have to build a lot of infrastructure to automate setting those up, rebalancing traffic to avoid hot spots and underutilized shards, in sync with schema migrations etc.

Even after that, now your applications using the DB have to be aware of the sharding - interactions between users who are housed on different shards etc could require a lot of work at the application layer. If your customers can be easily be split into tenants which never interact with each other this isn't so bad but for a consumer app like DoorDash there isn't clear tenant boundaries.

We looked at all this for Kami and realised that it would be much easier for us to move from PostgreSQL to CockroachDB (we had exceeded the write capacity of a single PostgreSQL primary) than to shard Postgres, and it'd make future development much faster. We could have made sharding work if we had to... but it's not 2013 any more and we have distributed SQL databases, why not use them?

That’s surprising — the education market seems like an even better fit for sharding: students and teachers generally stay within the context of a single school.
It does seem like there would be a clean boundary between each school district, but actually there's plenty of sharing and collaboration on Kami that happens with users between districts, teachers and students move schools, parents can have children in different school districts, etc. Even a single classroom assignment can cross those, e.g. when someone external comes in to do a training session.

We could have modified our application layer to handle those cases, but it's a lot of extra complexity and room for error, and we'd have had to consider and solve for all of these cross-tenant situations as we add new functionality, so I was really keen to avoid that.

Also, there are some really big districts - NYCDOE has >1.1 million students and 1,800 schools. Even with them on a dedicated shard, it's quite possible that it'd get overloaded and we'd be spending more dev effort figuring out how to safely split them onto multiple shards.

When we looked at using distributed SQL database instead it was a clear win - from the application's perspective, it just looks like a really, really big PostgreSQL box, so we didn't need to change much. (the SQL support is very close to PG - The most annoying thing for us was the lack of trigram indexes, and Cockroach has now added those now). And in terms of the operational side, upgrading and maintaining CRDB has actually been easier than PG - version upgrades are easier to do without downtime, and schema migrations don't lock tables.

I've never done geographic sharding but it seems kind of hard. How do you pick shard boundaries? How do you deal with entities who are near the boundaries and whose current operational data therefore spans >1 shards? (Imagine somebody at near the geographic intersection of like, five shards looking for pizza in a 10 miles radius or w/e)

Also the majority of entities they're tracking (users, drivers) do not have fixed locations.

Maybe it's not as hard as I'm thinking. I guess you just have to accept that any query can span an arbitrary number of shards and the results need to be union'd.

I'm sure a lot of smart people have tackled this at the DoorDashes and Ubers of the world and maybe there's some optimal way of handling it. I would love to hear about that.

Great points that show regional databases are not obviously simpler than one global database, especially from the application developer's perspective.

> How do you deal with entities who are near the boundaries and whose current operational data therefore spans >1 shards? (Imagine somebody at near the geographic intersection of like, five shards looking for pizza in a 10 miles radius or w/e)

Hitting 5 shards might not be that bad. I think you could divide the world into sufficiently large hexagonal tiles; you'd hit at most three shards then. Maybe each fixed-size tile is a logically separate database. Some would be much hotter than others, so you don't want to like back each by a fixed-size traditional DBMS or something; that'd be pretty wasteful.

> Also the majority of entities they're tracking (users, drivers) do not have fixed locations.

Yeah, you at least want a global namespace for users with consistent writes. The same email address belonging to different people in different regions is unacceptable. In theory the global data here could just be a rarely-updated breadcrumb pointing to which database holds the "real" data for that user. [1] So you can make the global database be small in terms of data size, rarely-written, mostly read in eventually consistent fashion, and not needed for geographically targeted queries. That could be worthwhile. YMMV.

[1] iirc Spanner and Megastore have a feature called (something like) "global homing" that is somewhat similar to this. For a given entity, the bulk of the data is stored in some subset of the database's replicas, and bread crumbs indicate which. If you get a stale bread crumb, you follow the trail, so looking up bread crumbs with eventually consistent reads is fine. [edit to add a bit more context:] One major use case for this is Gmail. It has tons of regions in total, but replicating each user's data to more than 2 full replicas + 1 witness would be absurdly wasteful.

[edit:] looks like CockroachDB has the concept of a per-row preferred region, which might also be vaguely similar. <https://www.cockroachlabs.com/docs/v23.1/table-localities#re...> I haven't used CockroachDB and only skimmed this doc section.

I think the global homing feature you are describing belongs to applications built atop Megastore (and Spanner). That is, Megastore itself is agnostic to it, but the application knows how to resolve the homing.
You could be right. My memory's fuzzy on this, and I don't see this feature in a casual skim of the Cloud Spanner docs.
Thanks for a thoughtful and informative reply, I learned some things.
> I've never done geographic sharding but it seems kind of hard. How do you pick shard boundaries? How do you deal with entities who are near the boundaries and whose current operational data therefore spans >1 shards? (Imagine somebody at near the geographic intersection of like, five shards looking for pizza in a 10 miles radius or w/e)

You could do it by market (eg. SFBA, Los Angeles, San Diego) or by state.

They would have to have many shards per city to keep up with the level of write traffic though. And what happens when a user from SFBA goes down to LA?
Would they?

I mean, I've seen conventional SQL databases handle ten million orders per hour on a single host. I find it hard to believe DoorDash is processing more than ten million orders per hour, even in a large city.

I suppose they might exceed what a single host can handle if they're, I don't know, recording every driver's location once per second?

    I suppose they might exceed what a single host 
    can handle if they're, I don't know, recording 
    every driver's location once per second?
Even then, that's not that much data. You only need to retain the current location of the driver and you can aggressively prune data more than N seconds old.

A quick Google suggests there are 2M Doordash drivers, but I'll assume that's "all drivers who have ever signed up for DoorDash, ever" and the number of DoorDash drivers actually working at any given moment is a small fraction of that.

If we assume that a max of 100,000 drivers are working at any moment, and a slightly more relaxed location update interval of 10 seconds, that's 10K updates per second which is not exactly super high performance stuff. Of course, tracking driver location is just once piece of their operations.

You could probably just do it by continent as no one from SA would order anything from Europe but at that point each of those databases is probably big enough that you would need a similar solution as if you just had a single one

    You could probably just do it by continent as no 
    one from SA would order anything from Europe
Imagine a luxury version of DoorDash that does work this way. As I awake in my luxury palace in Sao Paulo, I realize that I would like some fresh grapes from the champagne region of France. With a few taps on my Luxury Door Dash app, a plane is on its way with my grapes.
Cockroach automates the sharding of data by region and provides tools that let you use and manage it more like a traditional database. If they didn't use cockroach, they would have to write/setup tools and adapters to do all that anyway. It would probably be more familiar to developers conceptually if they used traditional sharding, but why build and maintain all that when you can just use off the shelf software?
CockroachDB had "Follow the sun" multi-region data balancing, which then got generalized to "Follow the workload"[0].

[0] https://www.cockroachlabs.com/docs/stable/topology-follow-th...

Because CockroachDB is a vendor that abstracts away all the thinking parts of running a database cluster. They do regional sharding, clustering, consistency, etc. for you.

They could have just as easily dropped in Oracle. You pay for expensive DB up front, and can hire cheaper junior DBAs and developers going forward.

The article says they have 300+ clusters, not one monster one.
Manual sharding is a crutch and a pain. Just use a distributed database, and let the database company worry about it.
Resume driven development
But it's at scale!!!!!