Hacker News new | ask | show | jobs
by gizmo 960 days ago
DoorDash has about 35 million users, and there is zero interaction between users. The median user uses doordash maybe once a week. So 5 million sessions a day, all happening in the same 3 hour window. That's 2 million sessions per hour at peak times.

How does DoorDash get to 1.2 million queries per second. 1.2mqps * 10000 seconds in 3 hours = 12 billion queries to process 5 million orders? That's wild. Is it all analytics? This is highly suspect. 35m users isn't nothing, but it isn't exactly Facebook scale either.

8 comments

I’m not excusing the wild number, but just tossing out some additional load: * Drivers checking in for work, especially if the apps poll automatically * Drivers phoning home with live location updates * Restaurants sending automated updates on order status * Push notifications to users with status changes on their orders * Users with multiple devices (like I have at least 5 devices with the UberEats app)
Yes, our server had 120k queries/sec, but 80% of that traffic was driver heartbeats or connection verification. We halved it by disabling the connection verification query.
Hold up, what do you mean by "our server"? Do you work for DoorDash?
Even just searching and browsing restaurants and menus is probably dozens of queries for every interaction.
Serializing json is pretty expensive at scale. I would be shocked if restaurant and menu json doesn't get cached aggressively.
Bit hard to cache search results.
depends on how many different queries your think there are. there's a long tail of unique ones, but searching for eg pizza seems eminently cacheable, given magic algorithmic ranking.
> 12 billion queries to process 5 million orders?

2,400 queries per order? That's not that crazy IMHO. There might be significant database fan-out on each click (depending on how they do geographic lookups, search ranking / synonyms / sponsored stuff, the repeat your last order features, whether the ranked search returns the full object or a reference that then has to be individually queried, etc.). There might be many clicks per order because people browse a lot (both to find a restaurant then to find dishes within the restaurant), leave reviews, poll for delivery status updates, etc.

That's fair but that also suggests most actions hit the main database directly instead of caching layers. Possible, but somewhat unusual at this scale.
In quorum systems like CockroachDB, non-leaders provide tons of extra capacity for eventually consistent reads. [edit: maybe a bit less so in a big database because at any instant one machine should be a leader for some shards and non-leader replica for others.] It's not always worth the complexity of having a high-hit-rate cache in front of that. Maybe no cache is needed, or just one to mitigate the worst of the hot spots.
> 2,400 queries per order? That's not that crazy IMHO.

Isn't that off by at least an order of magnitude though? It forces them to operate a much larger cluster than should be necessary.

> Isn't that off by at least an order of magnitude though?

No, for all the reasons I just said?

> It forces them to operate a much larger cluster than should be necessary.

How much machine cost and operational effort do you imagine they would save if they reduced the qps by a factor of 10 without changing the number of regions, number of tables, or size of the data? How much SWE time do you imagine that'd take to do and maintain?

I've run a global Paxos-based database that received two orders of magnitude more qps than this. It cost less than you're probably imagining. I sometimes hunted down silly queries, but mostly leader ops, and mostly to mitigate hot spots or as a quixotic latency reduction effort...overall, this was the cheapest layer of the system.

A query to a well-implemented OLTP database is not like a request to some Python/PHP/Ruby app.

Given how these blog posts are typically written, it is very likely the 1.2 million QPS figure is an all-time peak, not anything like an average.
According to their slides/video (bottom of the blog post), the 1.2 million QPS is their daily peak number, not the average.
It’s naive to think that database access is only happening when a customer makes an order. Each driver has workflows they exercise and data that needs stored. Same with vendors. There could be operational data for their infrastructure.

That said 1.2 million queries per second is wild. Would be interesting to see the breakdown.

> there is zero interaction between users

A curious description for a platform which acts as a broker for transactions between users!

Massive amounts of user tracking.
Your numbers are off by a large factor.
It's all the new reminders telling users to tip.