Hacker News new | ask | show | jobs
by dwohnitmok 1755 days ago
I think the article is missing one big reason why we care about 99.99% or 99.9% latency metrics and that is that we can have high latency spikes even with low utilization.

The majority of computer systems do not deal with high utilization. As has been pointed out many times, computers are really fast these days, and many businesses may be able to get away through their entire lifetime on a single machine if the underlying software makes efficient use of the hardware resources. And yet even with low utilization, we still have occasional high latency that still occurs often enough to frustrate a user. Why is that? Because a lot of software these days is based on a design that intersperses low-latency operations with occasional high-latency ones. This shows up everywhere: garbage collection, disk and memory fragmentation, growable arrays, eventual consistency, soft deletions followed by actual hard deletions, etc.

What this article is advocating for is essentially an amortized analysis of throughput and latency, in which case you do have a nice and steady relationship between utilization and latency. But in a system which may never come close to full utilization of its underlying hardware resources (which is a large fraction of software running on modern hardware), this amortized analysis is not very valuable because even with very low utilization we can still have very different latency distributions due to the aforementioned software design and what tweaks you make to that.

This is why many software systems don't care about the median latency or the average latency, but care about the 99 or 99.9 percentile latency: there is a utilization-independent component to the statistical distribution of your latency over time and for those many software systems which have low utilization of hardware resources that is the main determinant of your overall latency profile, not utilization.

1 comments

Even worse, the effects that you mention (garbage collection, etc.) are morally equivalent to an increase in utilization, which pushes you towards the latency singularity that the article is talking about.

As an oversimplified example, suppose that your system is 10% utilized and that $BAD_THING (gc, or whatever) happens that effectively slows down the system by a factor of 10 at least temporarily. Your latency does not go up by 10x---it grows unbounded because now your effective utilization is 100%.