Hacker News new | ask | show | jobs
by jfim 965 days ago
> I've never done geographic sharding but it seems kind of hard. How do you pick shard boundaries? How do you deal with entities who are near the boundaries and whose current operational data therefore spans >1 shards? (Imagine somebody at near the geographic intersection of like, five shards looking for pizza in a 10 miles radius or w/e)

You could do it by market (eg. SFBA, Los Angeles, San Diego) or by state.

1 comments

They would have to have many shards per city to keep up with the level of write traffic though. And what happens when a user from SFBA goes down to LA?
Would they?

I mean, I've seen conventional SQL databases handle ten million orders per hour on a single host. I find it hard to believe DoorDash is processing more than ten million orders per hour, even in a large city.

I suppose they might exceed what a single host can handle if they're, I don't know, recording every driver's location once per second?

    I suppose they might exceed what a single host 
    can handle if they're, I don't know, recording 
    every driver's location once per second?
Even then, that's not that much data. You only need to retain the current location of the driver and you can aggressively prune data more than N seconds old.

A quick Google suggests there are 2M Doordash drivers, but I'll assume that's "all drivers who have ever signed up for DoorDash, ever" and the number of DoorDash drivers actually working at any given moment is a small fraction of that.

If we assume that a max of 100,000 drivers are working at any moment, and a slightly more relaxed location update interval of 10 seconds, that's 10K updates per second which is not exactly super high performance stuff. Of course, tracking driver location is just once piece of their operations.