Hacker News new | ask | show | jobs
by prestonh 2983 days ago
Serious defense: p-values have been in use for a long time, and while they are error prone a larger number of true results has been found than false results (according to recent research the ratio is 2:1 reproducible to non-reproducible).

In the cartoon, the scientists are making multiple comparisons which is something strictly forbidden in frequentist hypothesis testing. One way to get around it is to apply a correction by dividing the significance theshold ("alpha") by the number of comparisons being made, in this case 20. The cartoon does not state it's actual p-value as most journals will require, but the hope would be that by dividing by the corrective factor the significance of that particular comparison goes away.

So p-value methods still lead to a lot of Type I and Type II errors, but in the past they have been the best science has been able to come up with. Actually, probably the greatest issue with false results in the scientific literature is that null results are not publishable. This leads to a case where 20 scientists might independently perform the same experiment where the null is true, for only one to find a significant result. The demand for positive results only acts as a filter where only Type I errors get made! This is just one problem with the publishing culture, and doesn't take into account researchers' bias to manipulate the data or experiment until p < .05.

An alternate approach to the frequentist methodology of using p-values is the Bayesian method, which has its own problems. First there are practical concerns such as choosing initial parameters that can affect your results despite sometimes being arbitrarily chosen, and also the high computational demand to calculate results (less of an issue in the 21st century, which is why the method is seeing a revival in the scientific community). Probably their main problem right now is that practitioners simply aren't familiar with how to employ Bayesian methods, so there's some cultural inertia preventing their immediate adoption.

1 comments

while they are error prone a larger number of true results has been found than false results (according to recent research the ratio is 2:1 reproducible to non-reproducible)

It seems odd to talk about "results" as an average across all fields, rather than for a specific field. It's much more common for people to claim that psychology rather than physics has a reproducibility crisis, and thus I don't think it makes sense to talk about the combined reproducibility across both fields. What research are you referencing, and what fields did they look at? Given the differences across fields, if the average is 2:1 reproducible, I'd guess that some fields must be lower than 1:1.

You're right, it definitely depends on the field. The paper I am referencing looked at psychology, I believe. It is likely that a social science would have greater issues with reproducibility than a physical science.