Hacker News new | ask | show | jobs
by achenatx 1738 days ago
Ive been trying to get the marketing team to always include a std deviation with averages. Average alone is simply not useful, standard deviation is a simple way to essentially include percentiles.

They regularly compare experiments to the mean but dont use a T test to ensure the results are actually different from the mean.

3 comments

I heavily caution against the feeling that "standard deviation is a simple way to essentially include percentiles." The usefulness of the standard deviation depends on the distributions that you are working with. Heavy tailed distributions appear a fair amount in practice, and the combo of summary statistics mentioned would not do well on those. Also, Madars' comment in this thread is a beautiful example of this: 4 completely different distributions, with identical mean and standard deviation (among other things). Histograms and percentiles, and if necessary their approximations, are more desirable for the above reasons.
I assume most of the distributions a marketing department would be dealing with are generally normal in which case stddev is a great way to analyze the data. This can be easily verified by just plotting said data and making sure the tails don't look weird.
I can't help but idly wonder what humans are doing when they are eyeballing the tails, to see if things look good. Like lets say we wanted to do the eyeball test but automatically. Would the best way be to use an image classifier on the plot? Is there something magic about the plot representation that would make it good even for computers to use?
One thing I like about this post is that it explains things in an accessible way before getting into a deep dive. Might be worth sharing with the marketing team as they'll "get" long tail in the context of web search, so the concept is fairly transferable to stuff that they would know about.
NB: Post author here.

Std deviation definitely helps a lot, still often not as good as percentiles, was actually thinking about adding some of that in the post but it was already getting so long. It's funny how things you think are simple sometimes take the most effort to explain, definitely found that on this one.

Yeah -- std deviation has a similar problem to the mean in that it doesn't give you a full picture unless the distribution is close to normal / gaussian.
Pretty much why summary statistics often give the IQR, which gives some idea to the skew and shape of the distribution as well.

Unfortunately, BD and marketing just want a single number to show that the value is bigger and hate anything more complicated than a barchart.

NB: Post author here.

We've been meaning to add IQR as an accessor function for these, may have to go back and do it...the frequency trails [1] stuff from Brendan Gregg also goes into some of this and it's really cool as a visualization.

[1]: https://www.brendangregg.com/FrequencyTrails/mean.html

Barchart is basically your percentiles (just more of them) so why not show it? Bars and whiskers could be more complicated for them but still the same sort of data
Barcharts across categorical data :P

That is, the first bar is "Our Number" and the second bar is "Competitor's number."

Someone clearly gets it. Variability viz and spread detracts from that clear message.