|
Here's a more simple thought experiment that gets across the point of why p(null | significant effect) /= p(significant effect | null), and why p-values are flawed as stated in the post. Imagine a society where scientists are really, really bad at hypothesis generation. In fact, they're so bad that they only test null hypothesis that are true. So in this hypothetical society, the null hypothesis in any scientific experiment ever done is true. But statistically using a p value of 0.05, we'll still reject the null in 5% of experiments. And those experiments will then end up being published in scientific literature. But then this society's scientific literature now only contains false results - literally all published scientific results are false. Of course, in real life, we hope that our scientists have better intuition for what is in fact true - that is, we hope that the "prior" probability in Bayes' theorem, p(null), is not 1. |
The problem with this picture is that it's showing publication as the end of the scientific story, and the acceptance of the finding as fact.
Publication should be the start of a the story of a scientific finding. Then additional published experiments replicating the initial publication should comprise the next several chapters. A result shouldn't be accepted as anything other than partial evidence until it has been replicated multiple times by multiple different (and often competing) groups.
We need to start assigning WAY more importance, and way more credit, to replication. Instead of "publish or perish" we need "(publish | reproduce | disprove) or perish".
Edit: Maybe journals could issue "credits" for publishing replications of existing experiments, and require a researcher to "spend" a certain number of credits to publish an original paper?