| HN Mirror

In my previous job me and latency-sensitive engineering teams in general mostly went with just four core latency measurements.[ß]

    - p50, to see the baseline
    - p95, to see the most common latency peaks
    - p99, to see what the "normal" waiting times under load were
    - max, because that's what the most unfortunate customers experienced

In a normal distributed system the spread between p99 and max can be enormous, but the mental mindset of ensuring smooth customer experience, with awareness that a real person had to wait that long, is exceptionally useful. You need just one slightly slower service for the worst-case latency to skyrocket. In particular, GraphQL is exceptionally bad at this without real discipline - the minimum request latency is dictated by the SLOWEST downstream service.

To be fair, it was a real time gambling operation. And we were operating within the first Nielsen threshold.

ß: bucketing these by request route was quite useful.

EDIT: formatting