Hacker News new | ask | show | jobs
by Dowwie 2272 days ago
Could you elaborate on why this is any more crucial for microservices than it is monolith?
2 comments

Intuitively, if your request is handled by 1 service you have 1 chance that your request lands at the extreme end of the latency distribution. If your requests require 20 services, that's 20 chances.

In reality maybe someone is able to make each microservice so much more performant and is able to deal with slow or failed requests gracefully in the UX. Some sites do, but it doesn't automatically by any means.

A couple weeks ago someone brought up High Frequency Trading and while in theory it didn’t tell me anything I didn’t already know, I’ve been chewing on the thesis of the linked article ever since: that the real trick to doing things quickly is to do them consistently. That variance causes far more kinds of practical problems than does average response time.
Slow is smooth; smooth is fast.
Indeed.
Saying a different way: for fork/join or barrier style parallel requests, stragglers set overall latency, and though the probability of any specific response being a straggler may be low, the probability of at least one response being a terrible straggler gets very high at large scales (or large fan-outs).
This a life changing video by Gil Tene on this topic. Bottom line is because of the number requests to support a single customer "request", the high percentiles are the ones that actually matter.

https://www.youtube.com/watch?v=lJ8ydIuPFeU&list=WL&index=22...