Hacker News new | ask | show | jobs
by avidiax 75 days ago
When I give system design interviews, candidates that start adding queues reflexively to the design always do poorly.

Queueing is only useful for a few cases, IMO:

* The request is expensive to reject. For example, the inputs to the rejected request also came from expensive requests or operations (like a file upload). So rejecting the request because of load will multiply the load on other parts of the system. You still need backpressure or forwardpressure (autoscaling).

* Losing a request is expensive, delaying the result is not. Usually you want a suitably configured durable queueing system (e.g. Kafka) if you have this scenario.

* A very short queue is acceptable if it's necessary that downstream resources are kept 100% busy. A good example of this is in a router, the output to a slower link might queue 1-2 packets so that there is always something to send, which maximizes throughput.

* If you have very bursty traffic, you can smooth the bursts to fit in your capacity. But this runs the danger of having the queue always full, which you have to manage with load shedding (either automated or manual).

----

An underappreciated queue type is LIFO (last-in, first-out). It sounds unfair, but it keeps you from moving the median response time at the cost of the maximum response time, and it behaves well when full. It fails over into either responding quickly or just rejecting requests when full, so it works well for dealing with bursty traffic.

2 comments

> An underappreciated queue type is LIFO (last-in, first-out). It sounds unfair, but it keeps you from moving the median response time at the cost of the maximum response time

Why is that beneficial?

Suppose you are a building contractor. You have given start dates for future jobs, but your current job is going to run over the expected time. You can choose between:

1 slip every job, annoying all of the customers whose jobs are queued up. You get a bad reputation.

2 Move onto the next job on time, and gradually complete the stalled job in the background by sending workers back to it when you have spare (which you should have, because in general you must overestimate or things will go badly wrong). That customer will now suffer because their job is going to take a multiple of the expected time, but all of the other customers are happy, so your reputation is good.

I’ve observed airlines will do this as well if they have maintenance or gate queues. They will sacrifice 1-2 flights (hours late or even cancelled) to keep many other flights near on-time. Fewer angry customers, better reported average “on-time” metrics.
Author here. It's a great analogy.

I had a section in the post I cut out about how optimizing queue selection started out as a technical problem, but transformed into more of a business and ethical problem the more I pondered it.

You're effectively deciding how to distribute suffering across a large group of people.

Comes up in any situation where large metric gains can be accomplished by optimizing for specific groups - recommender and personalization systems are another example.

A short queue is different from a long one. Toyota keeps a box of bolts on the line - a type of queue - instead of ordering them individually as needed. However there should be just enough queue - if it backs up that is a problem.
That sort of queue is fundamentally different than network queues. The individual bolts are not impatient waiting to be installed. There is no P99 bolt install latency measuring from when the bolt arrives in the factory.

You therefore only need enough bolts at each station that they won't run out before the restocker completes a lap, and such that there aren't so many that they get in the way.