|
|
|
|
|
by bostik
1755 days ago
|
|
In my previous job me and latency-sensitive engineering teams in general mostly went with just four core latency measurements.[ß] - p50, to see the baseline
- p95, to see the most common latency peaks
- p99, to see what the "normal" waiting times under load were
- max, because that's what the most unfortunate customers experienced
In a normal distributed system the spread between p99 and max can be enormous, but the mental mindset of ensuring smooth customer experience, with awareness that a real person had to wait that long, is exceptionally useful. You need just one slightly slower service for the worst-case latency to skyrocket. In particular, GraphQL is exceptionally bad at this without real discipline - the minimum request latency is dictated by the SLOWEST downstream service.To be fair, it was a real time gambling operation. And we were operating within the first Nielsen threshold. ß: bucketing these by request route was quite useful. EDIT: formatting |
|