Hacker News new | ask | show | jobs
by vlmutolo 1648 days ago
> However, in the future I would pick a different visualization I think

I think the box plots were a good choice here. I quickly understood what I was looking at, which is a high compliment for any visualization. When it's done right it seems easy and obvious.

But the y-axis really needs to start at 0. It's the only way the reader will perceive the correct relative difference between the various measurements.

As an extreme example, if I have measurements [A: 100, B: 101, C: 105], and then scale the axes to "fit around" the data (maybe from 100 to 106 on thy y axis), it will seem like C is 5x larger than B. In reality, it's only 1.05x larger.

Leave the whitespace at the bottom of the graph if the relative size of the measurements matters (it usually does).

4 comments

>> It's the only way the reader will perceive the correct relative difference...

Every day, the stock market either goes from the bottom of the graph to the top, or from the top all the way to the bottom. Sometimes it takes a wild excursion covering the whole graph and then retreats a bit toward the middle. Every day. Because the media likes graphs that dramatize even a 0.1 percent change.

No, the media just happens to sometimes share OP’s intend: to show a (small) absolute change. That change may or may not be as dramatic as the graph suggests in both visualizations: measured in Kelvin, your body temperature increasing by 8 K looks like a tiny bump when you anchor it at absolute zero. “You” being the generic “you”, because at 47 deg C body temperature, the other you is dead.

It will be visible if you work in Celsius, a unit that is essentially a cut-off Y axis to better fit the origin within the domains we use it for.

The change still needs context.

We have an intuitive sense of what 30 degrees is, assuming it is in our preferred system of measurement.

A stock market graph really should be showing the percentage change, not some small absolute change that it’s not immediately understood by the typical layperson.

This notion about cut-off y-axes is the data visualization equivalent of “correlation is not causation”: it’s a valid point that’s easily understood, so everyone latches on to it and then uses it to proof their smartitude, usually with the intonation of revealing grand wisdom.

Meanwhile, there are plenty of practitioners who aren’t obviously to the argument, but rather long past it: they know there are situations where it’s totally legitimate to cut the axis. Other times, they might resort to a logarithmic axis, which is yet another method of making the presentation more sensitive to small changes.

There are plenty of instances where it's appropriate to use a y-axis that isn't "linear starting at zero." That's why I specified that I was only talking about ways to represent relative differences (i.e. relative to the magnitude of the measurements).

In this case, when we're measuring the latency of requests, without any other context, it's safe to say that relative differences are the important metric and the graph should start at zero.

So while it's true that this isn't universally the correct decision, and it's probably true that people regurgitate the "start at zero" criticism regardless of whether it's appropriate, it does apply to this case.

I think these choices are more context specific than is often appreciated. For example

> if I have measurements [A: 100, B: 101, C: 105], and then scale the axes to "fit around" the data (maybe from 100 to 106 on thy y axis), it will seem like C is 5x larger than B. In reality, it's only 1.05x larger.

If you were interested in the absolute difference between the values then starting your axis at 0 is going to make it hard to read.

It is however very rare that absolute differences matter; and even when they do, the scale should (often) be fixed. For example the temperatures:

[A: 27.0, B: 29.0, C: 28.0]

versus:

[A: 27.0, B: 27.2, C: 26.9]

If scale is fit to the min and max values, the charts will look the same.

Still, as a rule of thumb, when Y axis doesn't start at 0, the chart is probably misleading. It is very rare that the absolute size of the measured quantity doesn't matter.

Yeah, you should graph both starting at 0K right? You wouldn't want to mislead people into thinking somthing at 10C is ten times more hot than something at 1C.
Indeed. And if they don't you are probably better off normalizing your axis anyway.
Agreed. Next time I'll make the text and other things a little larger too (the real graphs are actually quite large, I had to shrink them to fit the article formatting.) I'd already spent so much time on the article I didn't want to go back and redo the graphs (I didn't really think too many people would read it - it was a big surprise to see it on HN)