Hacker News new | ask | show | jobs
by spenczar5 2272 days ago
My pick is Jeff Dean's The Tail At Scale (2013) [1]. If you're interested in performance of web services, it's very, very well-written and describes a counterintuitive (but crucial!) phenomenon: your p99.99 latency can drive the entire user experience because of head-of-line blocking. As microservice architectures have gotten more popular, this has only become more important.

It's a fiendishly difficult problem to get around, but Dean proposes a few mechanisms. I learned a lot and think back to this paper often when designing real software.

Reading it will probably make you a better engineer.

[1] https://dl.acm.org/doi/abs/10.1145/2408776.2408794

2 comments

An argument I’ve had a few times and wish I didn’t:

Yes, this situation only happens a small fraction of the time, but it happens exactly once to each user, and the first time they use it. It doesn’t matter what the stats say. The facts are that every user will see this problem and it will be their first experience with the feature.

In such situations I usually ask "whose percentile are we talking about ? the percentile of the latencies of the service or the percentile of the wait time of the user ?" Then I show that for many scenarios the percentile that the user experiences can be drastically different.
Could you elaborate on why this is any more crucial for microservices than it is monolith?
Intuitively, if your request is handled by 1 service you have 1 chance that your request lands at the extreme end of the latency distribution. If your requests require 20 services, that's 20 chances.

In reality maybe someone is able to make each microservice so much more performant and is able to deal with slow or failed requests gracefully in the UX. Some sites do, but it doesn't automatically by any means.

A couple weeks ago someone brought up High Frequency Trading and while in theory it didn’t tell me anything I didn’t already know, I’ve been chewing on the thesis of the linked article ever since: that the real trick to doing things quickly is to do them consistently. That variance causes far more kinds of practical problems than does average response time.
Slow is smooth; smooth is fast.
Indeed.
Saying a different way: for fork/join or barrier style parallel requests, stragglers set overall latency, and though the probability of any specific response being a straggler may be low, the probability of at least one response being a terrible straggler gets very high at large scales (or large fan-outs).
This a life changing video by Gil Tene on this topic. Bottom line is because of the number requests to support a single customer "request", the high percentiles are the ones that actually matter.

https://www.youtube.com/watch?v=lJ8ydIuPFeU&list=WL&index=22...