Hacker News new | ask | show | jobs
by jordigh 4031 days ago
> Stop saying: “We’ve reached 95% statistical significance.”

> And start saying: “There’s a 5% chance that these results are total bullshit.”

Argh, no, no, no and no!

95% significance is NOT 95% probability! When you select a confidence level of a 95%, the probability that your results are nonsense is ZERO or ONE. There is no probability statement associated to it. Just because something is unknown does not mean that you can make a probability statement about it, and the mathematics around statistical testing all depend on the assumption that the parameter being tested is not random, merely unknown...

Rather, 95% statistical significance means, we got this number from a procedure that 95% of the time produces the right thing, but we have no idea whether this particular number we got is correct or not.

UNLESS!

Unless you're doing Bayesian stats. But in that case your procedure looks completely different and produces very different probability intervals instead of confidence intervals, and you don't talk about statistical significance at all, but about raw probabilities.

3 comments

> 95% statistical significance means, we got this number from a procedure that 95% of the time produces the right thing

The original post is incorrect about the probabilistic interpretation of the 95% confidence interface, but this interpretation is also wrong.

In classical statistics, p<0.05 means that, if there is no difference in our sample populations (i.e. the null hypothesis), then the probability of observing a difference at least this extreme is less than 0.05.

I'm not really sure what you're trying to say.

> Rather, 95% statistical significance means, we got this number from a procedure that 95% of the time produces the right thing, but we have no idea whether this particular number we got is correct or not.

I.e. We got this number from a procedure and there's a 5% chance it didn't produce the right thing.

Nope. It's "If we did this infinitely more times, there's a 5% of those samples wouldn't have significant results". It's a subtle but important distinction.

Though I'm surprised that his advice wasn't "Report confidence intervals at least". There's much more meaningful information in a point estimate and confidence interval than "p < 0.05"

Sorry, confidence intervals are just a different presentation of the same information as p-values, and don't contain any more or less information.
While built off the same information, and it's possible to do an ad hoc significance test off it, confidence intervals tell you more about the spread of estimate. Especially if, as the author is suggesting, you're not even reporting the actual p-value, but just whether or not it's below a particular threshold.
If you have a few minutes to spare, I would very much welcome your thoughts so that I can either correct the article, or take it down - The last thing I want is for it to sit out there on the open internet as misinformation.

My goal was to create a framework which — while less mathematically accurate (hence “rhetorical device”) — helped convey the seriousness of making business decisions based on P = 0.05 to people for whom 95% statistical significance doesn’t mean anything. And clearly, based on reactions here, I failed at that goal.

So, if you’re game, I’ll quickly to walk you through my thinking, and you can help me understand where I went wrong.

Best way to contact?

Rephrasing "95% probability of correctness" as "5% chance of bullshit" is perfectly fine, and a good way to look at things. The problem is that "p = 0.05" doesn't mean either of those things, or even anything close to either of those things. P-values are always taking about a null hypothesis, and only the null hypothesis. The p-value answers "How rare would this result be if the null hypothesis is true?" Note that the alternative hypothesis, which is what you really want to know about, never even enters the question. This is why people have such issues with p-values. People want to know about the alternative hypothesis, and they want to believe that the statistical tool they're using is answering their question, but a p-value is answering a different question entirely.

It's intuitively obvious that a result that is unlikely under the null hypothesis constitutes some evidence in favor of the alternative hypothesis, but the precise nature of that relationship depends on information that is not usually available, such as prior estimates of the likelihood that each model is true. If such information is available, you can use Bayesian statistics to answer the question that you really want to ask (e.g. "What is the probability that the alternative hypothesis is true given this data?"), instead of using p-values to answer the only question you are capable of answering, even though that answer isn't a particularly useful one.

For a concrete example, xkcd comes to the rescue: https://xkcd.com/882/

Consider that, when testing the 20 flavors, you expect to get at least one p-value of 0.05 by random chance, since 0.05 = 1 in 20. So in this specific case there's actually a very high probability (much higher than 5%, even higher than 50%) that the result is bullshit. But even when you're doing a single test, not 20 of them, a p-value of 0.05 can still mean much higher than 5% of bullshit. Or it could be much lower.

Lastly, note that "confidence intervals" are just a statement of the thresholds for p-values. For example, the 95% confidence interval includes your null hypothesis if and only if your p-value is greater than 0.05. So everything I said above about p-values applies equally well to confidence intervals. In particular, "95% confidence interval" does NOT mean "95% confidence that the value is within this interval".

If you want to ask me some more questions, email me at rct at thompsonclan dot org.

This is far and away the most helpful response I've gotten, thank you.

Will digest, edit, and probably hit you up with another question or two.

Thanks again