Hacker News new | ask | show | jobs
by aetherson 4876 days ago
TL;DR: Average anything is a terrible way to track anything. (And median or mode are bad, too). Any single-scalar value that compresses information that is best expressed as a graph (or multiple graphs!) is immensely lossy to the point where arguably it obfuscates more than it makes clear.

Back when we had to live with sort of printing-press methods of displaying information (ie, where anything that wasn't pure text was very difficult to display), mean/median/mode numbers were a necessary evil. But if you're looking at a computer screen, there's really no reason to subject yourself to an abstraction that throws out 90% of your data.

5 comments

This was one of the more interesting realizations when I was an undergraduate writing my first research paper. We were testing latency of MIDI interfaces, and after sanity checking by looking at some of the underlying data, realized that average, or even average+stddev, was obscuring a lot of stuff. For example, note-to-note consistency is a major issue in music interfaces, often more important than absolute latency, since the spacing between notes is very important to melody perception (games often have a similar issue).

Showing the full histogram isn't a full solution either, though. Not only does using the average latency obscure the issue by boiling it down to a single scalar, but the full histogram of latencies also loses the information on note-to-note consistency! That's because a latency histogram loses sequencing information, so it doesn't distinguish between the case where you had a lot of 20ms latencies in a row followed by a lot of 50ms latencies in a row, and the case where every other message oscillated between 20ms and 50ms latencies (much worse). You can try to capture some of that information by making a histogram of adjacent-latency deltas, as one attempt. Or you can capture a different view on it by plotting latency vs. time and looking for spikes (but that can obscure less-obvious trends, and is unwieldy as a data representation if you're trying to summarize a system's behavior over a period of hours).

The paper is here, though the actual numbers are 9 years old at this point, so probably not that useful: http://www.cs.hmc.edu/~bthom/res/midi_timing/publications/IC...

> Average anything is a terrible way to track anything.

Came here to say exactly this. And averages are especially insidious when used for data that doesn't have a symmetric distribution, like most latencies.

hi Steve,

Author here. I think most people on HN would echo your sentinment about averages wholesale ... But I wanted to go a little deeper into selecting a better alternative for operational monitoring.

Its easy to say "averages are bad" but harder to say "use X instead", and explain why. We tried. Do you think we did it?

Well the title seems a bit childish (since obviously everybody on HN knows it's a terrible idea.) Why don't you change to post title to more appropriately reflect what you were trying to propose as an alternative.
Additional standard statistics like mode, median, quartiles etc are really useful.

And you can always throw things into gnuplot to get a quick, exploratory look at things. It will at least give you sense of whether you're looking at a normal distribution, something skewed, multi-modal distributions etc etc.

Hi, author here.

I am in complete agreement. Unfortunately, a lot of monitoring and APM tools still lead with average response time as one of the toplevel metrics. And a lot of people still make incorrect assumptions based on it.

Although, the percentile on average latency is not great either. I try to make the case for using a metric that counts acceptable experiences vs. their latency value, e.g. the Apdex index or our derived sat score.

Best, Mike

I think The Tech Report has my favorite benchmark graphs. They sort the data points by latency so you can intuitively see the distribution of your samples. e.g. http://techreport.com/review/24022/does-the-radeon-hd-7950-s...
I almost completely agree with you. I often tell people that statistics is the study of compressing information in useful ways. That said, scalar statistics can be very useful if the compression is 'correct'. For example, if you have an a priori reason to believe a distribution will be gaussian (a very common situation, and an assumption that basically allowed statistics to be grow to where it is today), mean and variance will fully describe the distribution. Many other common distributions can be fully described by a small number of parameters.