| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vlmutolo 1648 days ago

> However, in the future I would pick a different visualization I think

I think the box plots were a good choice here. I quickly understood what I was looking at, which is a high compliment for any visualization. When it's done right it seems easy and obvious.

But the y-axis really needs to start at 0. It's the only way the reader will perceive the correct relative difference between the various measurements.

As an extreme example, if I have measurements [A: 100, B: 101, C: 105], and then scale the axes to "fit around" the data (maybe from 100 to 106 on thy y axis), it will seem like C is 5x larger than B. In reality, it's only 1.05x larger.

Leave the whitespace at the bottom of the graph if the relative size of the measurements matters (it usually does).

4 comments

phkahler 1648 days ago

>> It's the only way the reader will perceive the correct relative difference...

Every day, the stock market either goes from the bottom of the graph to the top, or from the top all the way to the bottom. Sometimes it takes a wild excursion covering the whole graph and then retreats a bit toward the middle. Every day. Because the media likes graphs that dramatize even a 0.1 percent change.

link

KarlKemp 1648 days ago

No, the media just happens to sometimes share OP’s intend: to show a (small) absolute change. That change may or may not be as dramatic as the graph suggests in both visualizations: measured in Kelvin, your body temperature increasing by 8 K looks like a tiny bump when you anchor it at absolute zero. “You” being the generic “you”, because at 47 deg C body temperature, the other you is dead.

It will be visible if you work in Celsius, a unit that is essentially a cut-off Y axis to better fit the origin within the domains we use it for.

link

wbsss4412 1648 days ago

The change still needs context.

We have an intuitive sense of what 30 degrees is, assuming it is in our preferred system of measurement.

A stock market graph really should be showing the percentage change, not some small absolute change that it’s not immediately understood by the typical layperson.

link

KarlKemp 1648 days ago

This notion about cut-off y-axes is the data visualization equivalent of “correlation is not causation”: it’s a valid point that’s easily understood, so everyone latches on to it and then uses it to proof their smartitude, usually with the intonation of revealing grand wisdom.

Meanwhile, there are plenty of practitioners who aren’t obviously to the argument, but rather long past it: they know there are situations where it’s totally legitimate to cut the axis. Other times, they might resort to a logarithmic axis, which is yet another method of making the presentation more sensitive to small changes.

link

vlmutolo 1648 days ago

There are plenty of instances where it's appropriate to use a y-axis that isn't "linear starting at zero." That's why I specified that I was only talking about ways to represent relative differences (i.e. relative to the magnitude of the measurements).

In this case, when we're measuring the latency of requests, without any other context, it's safe to say that relative differences are the important metric and the graph should start at zero.

So while it's true that this isn't universally the correct decision, and it's probably true that people regurgitate the "start at zero" criticism regardless of whether it's appropriate, it does apply to this case.

link

remus 1648 days ago

I think these choices are more context specific than is often appreciated. For example

> if I have measurements [A: 100, B: 101, C: 105], and then scale the axes to "fit around" the data (maybe from 100 to 106 on thy y axis), it will seem like C is 5x larger than B. In reality, it's only 1.05x larger.

If you were interested in the absolute difference between the values then starting your axis at 0 is going to make it hard to read.

link

amenod 1648 days ago

It is however very rare that absolute differences matter; and even when they do, the scale should (often) be fixed. For example the temperatures:

[A: 27.0, B: 29.0, C: 28.0]

versus:

[A: 27.0, B: 27.2, C: 26.9]

If scale is fit to the min and max values, the charts will look the same.

Still, as a rule of thumb, when Y axis doesn't start at 0, the chart is probably misleading. It is very rare that the absolute size of the measured quantity doesn't matter.

link

Tyr42 1648 days ago

Yeah, you should graph both starting at 0K right? You wouldn't want to mislead people into thinking somthing at 10C is ten times more hot than something at 1C.

link

sandgiant 1648 days ago

Indeed. And if they don't you are probably better off normalizing your axis anyway.

link

eric_trackjs 1648 days ago

Agreed. Next time I'll make the text and other things a little larger too (the real graphs are actually quite large, I had to shrink them to fit the article formatting.) I'd already spent so much time on the article I didn't want to go back and redo the graphs (I didn't really think too many people would read it - it was a big surprise to see it on HN)

link