Hacker News new | ask | show | jobs
by mdbco 4129 days ago
The article is certainly correct that p-values and confidence intervals (or confidence sets, in multi-dimensional contexts) are widely misunderstood, not just in psychology or other social sciences, but in the hard sciences as well. The problem is even worse when you look outside of academia at common practices in more applied settings.

As suggested, a good approach is to take p-values not as conclusive or decisive, but rather as a tool that must be supplemented by other statistics. In particular, the article emphasizes Bayesian methods, which can certainly provide additional information, but this approach can also be rather limited when priors are not well-defined or are entirely unknown, which is unfortunately often the case in many problem domains.

One potential question is how to determine the nature of the distinction mentioned in the conclusion between "preliminary research" and "confirmatory research", particularly in cases where statistics provide the primary evidence, as in, e.g. psychology. Further studies in the same vein as the preliminary research can certainly provide additional supporting statistical evidence, but this doesn't escape the problem that all of the evidence is probabilistic in nature. The key issue here is that since statistical approaches can only give probabilistic evidence that a hypothesis is correct, then they strictly cannot tell you what is certainly true, so even confirmatory research is quite open to falsification. So we wouldn't want the label of "confirmatory research" to somehow suggest to the public the idea that it is certainly correct.

1 comments

Is the cautious approach then to treat a p-value in the absence of priors on the same level as a p-value in presence of unfavorable priors? When someone tests positive for a cancer test, the priors are known (probability of cancer in the general population is usually very low, and the false positive rate of the test may be relatively high), and so usually that first test is merely indication that further tests are needed. So when you don't know the prior and you observe a low p-value on something, isn't that just "preliminary research" that needs to be further confirmed with other methods or at least the same test but using other data?
> Is the cautious approach then to treat a p-value in the absence of priors on the same level as a p-value in presence of unfavorable priors?

In the presence of a poor prior the Bayesian probability would be biased in some way, so frequentists would say that the p-value in the absence of priors is actually superior in this case. Bayesians would reply that if they thought the prior might be poor then they would simply consider multiple different priors, but it's not clear how this would improve things much over the frequentist approach that simply assumes that the prior is unknown.

> So when you don't know the prior and you observe a low p-value on something, isn't that just "preliminary research" that needs to be further confirmed with other methods or at least the same test but using other data?

Yes, when you observe a p-value with low significance it should definitely indicate to you that more testing is necessary, either by using different testing methods, gathering new samples, or even just increasing the original sample size if that's possible. What I was trying to suggest in my last paragraph was that this should be the case even when we have highly significant p-values, because even significant p-values are not decisive. So even when we have "confirmatory research" that is highly statistically significant, we should still do all of the things that we would do when we have a p-value with low significance. It is sometimes the case that this subsequent research will overturn even very highly statistically significant results (though often this is unfortunately because mistakes in the original statistical methodology are uncovered).