Hacker News new | ask | show | jobs
by rcthompson 4026 days ago
Rephrasing "95% probability of correctness" as "5% chance of bullshit" is perfectly fine, and a good way to look at things. The problem is that "p = 0.05" doesn't mean either of those things, or even anything close to either of those things. P-values are always taking about a null hypothesis, and only the null hypothesis. The p-value answers "How rare would this result be if the null hypothesis is true?" Note that the alternative hypothesis, which is what you really want to know about, never even enters the question. This is why people have such issues with p-values. People want to know about the alternative hypothesis, and they want to believe that the statistical tool they're using is answering their question, but a p-value is answering a different question entirely.

It's intuitively obvious that a result that is unlikely under the null hypothesis constitutes some evidence in favor of the alternative hypothesis, but the precise nature of that relationship depends on information that is not usually available, such as prior estimates of the likelihood that each model is true. If such information is available, you can use Bayesian statistics to answer the question that you really want to ask (e.g. "What is the probability that the alternative hypothesis is true given this data?"), instead of using p-values to answer the only question you are capable of answering, even though that answer isn't a particularly useful one.

For a concrete example, xkcd comes to the rescue: https://xkcd.com/882/

Consider that, when testing the 20 flavors, you expect to get at least one p-value of 0.05 by random chance, since 0.05 = 1 in 20. So in this specific case there's actually a very high probability (much higher than 5%, even higher than 50%) that the result is bullshit. But even when you're doing a single test, not 20 of them, a p-value of 0.05 can still mean much higher than 5% of bullshit. Or it could be much lower.

Lastly, note that "confidence intervals" are just a statement of the thresholds for p-values. For example, the 95% confidence interval includes your null hypothesis if and only if your p-value is greater than 0.05. So everything I said above about p-values applies equally well to confidence intervals. In particular, "95% confidence interval" does NOT mean "95% confidence that the value is within this interval".

If you want to ask me some more questions, email me at rct at thompsonclan dot org.

1 comments

This is far and away the most helpful response I've gotten, thank you.

Will digest, edit, and probably hit you up with another question or two.

Thanks again