|
|
|
|
|
by MrEldritch
2318 days ago
|
|
basically, because once you start trying multiple hypotheses on the same dataset, the math used to determine "is this conclusion real, or am I just fooling myself" begins to break down. The statistical significance threshold usually used is p<0.05, meaning that something is (generally, this is beginning to change since the replication crisis) considered to be a real discovery if it has less than a 1/20 chance of being a false positive under the chosen model. As soon as you start trying multiple hypotheses, then that 1/20 chance of being a false positive begins to become meaningless. If you can just keep rolling d20s until one of them comes up with a critical hit, then you can easily generate false positives that still look very robust. This is exactly the sort of bad science - p-hacking, fishing expeditions, and the garden of forking paths - that led to the replication crisis. (And that makes sense, as this paper is from 2013, and predates the widespread discovery of the crisis) |
|