Hacker News new | ask | show | jobs
by roland_nilsson 698 days ago
I think this problem is at least in part due to the hypothesis testing concept itself. Classical hypothesis testing is asymmetric: there is a "null" hypothesis, which is typically the uninteresting/useless case, and an "alternative" hypothesis, which is the one you would like to be true. Critically, you cannot determine if the data _supports_ the null hypothesis, only if the data _rejects_ it (and supports the alternative). A so-called "null result" occurs when data is not sufficient to reject the null hypothesis; then you can't tell if you actually have a useful finding (for example, that there is no major difference between species A and B) or a failed experiment (data was so bad/noisy that we cannot conclude anything). And so you end up with the unfortunate situation where you either succeed in proving your favorite hypothesis and get your degree / promotion / tenure, or you have nothing.

This happens because hypothesis testing conflates effect size (how big is the difference between A and B) with uncertainty about that effect size (significance/reproducibility). Confidence intervals are more useful IMHO, as they help untangle these two aspects, for example showing that the difference between A and B is small _and_ reproducible. Bayesian analysis is also a major improvement, as it allows examining both the "null" and "alternative" hypotheses on equal terms, as well as reasoning about our prior beliefs / biases. Unfortunately many areas of science are still stuck with statistical methods from the early 1900's.