Hacker News new | ask | show | jobs
by taneq 2984 days ago
> But statistically using a p value of 0.05, we'll still reject the null in 5% of experiments. And those experiments will then end up being published in scientific literature. But then this society's scientific literature now only contains false results - literally all published scientific results are false.

The problem with this picture is that it's showing publication as the end of the scientific story, and the acceptance of the finding as fact.

Publication should be the start of a the story of a scientific finding. Then additional published experiments replicating the initial publication should comprise the next several chapters. A result shouldn't be accepted as anything other than partial evidence until it has been replicated multiple times by multiple different (and often competing) groups.

We need to start assigning WAY more importance, and way more credit, to replication. Instead of "publish or perish" we need "(publish | reproduce | disprove) or perish".

Edit: Maybe journals could issue "credits" for publishing replications of existing experiments, and require a researcher to "spend" a certain number of credits to publish an original paper?

3 comments

That's a good idea: encourage researchers to focus on a mix of replication and new research. When writing grants, a part of that grant might be towards replicating interesting/unexpected results and the rest for new research. Moreover, given that the experiment has already been designed, replication could end up demanding much less effort from a PI and allow his students to gain some deliberate practice in experiment administration and publication. On the other hand, scholarly publication might have to be changed in order to allow for summary reporting of replication results to stave off a lot of repition.
My field has less of a "You publish first or you're not interesting" culture than many others, and part of what that is is recognizing that estimating an effect in a different population, with different underlying variables, is, itself, an interesting result all its own.

Tim Lash, the editor of Epidemiology, has some particularly cogent thoughts about replication, including some criticisms of what is rapidly becoming a "one size fits all" approach.

Let's think about costs.

Suppose all experiments have a p-value of 0.05. Suppose scientists generate 400 true hypotheses and 400 false hypotheses. One experiment on each hypothesis validates 380 true hypotheses and 20 false ones, for a cost of 800 experiments. If we do one layer of replication on each validated hypothesis, then, among the validated hypotheses, the 380 true will become 361 doubly-validated true hypotheses and 19 once-validated-once-falsified (let's abbreviate "1:1") true hypotheses; the 20 false will become one 2:0 false hypothesis and 19 1:1 hypotheses; all this increases the cost by 50%. Then it seems clear that doing a third test on the 38 1:1 hypotheses would be decently justified, and those will become 18.05 2:1 true hypotheses, 0.95 1:2 true hypotheses, 0.95 2:1 false hypotheses, and 18.05 1:2 false hypotheses. If we then accept the 2:0 and 2:1 hypotheses, we get 379.05 true and 0.95 false hypotheses at the cost of 1238 experiments, vs the original of 380 true and 20 false at the cost of 800 experiments; the cost increase is 54%.

On the other hand, suppose scientists generate 400 true and 4000 false hypotheses. The first experiments yield 380 1:0 true and 200 1:0 false hypotheses, at the cost of 4400 experiments. The validation round yields 361 2:0 true, 19 1:1 true, 10 2:0 false, and 190 1:1 false, costing 580 extra experiments; re-running the 1:1s, we get 18.05 2:1 true, 0.95 1:2 true, 9.5 2:1 false, and 180.5 1:2 false, costing 209 extra experiments. Taking the 2:0 and 2:1s, we get 379.05 true and 19.5 false hypotheses for 5189 experiments, instead of 380 true and 200 false hypotheses costing 4400 experiments; the cost increase is 18%.

So it's clear that, in a field where lots of false hypotheses are floating around, the cost of extra validation is proportionately not very much, and also you kill more false hypotheses (on average) with every experiment.

What is the "cost" of believing false hypotheses? It depends on what one does with one's belief. Hmm.

It would be nice if someone made a stab at estimating the overall costs and benefits and making a knock-down argument for more validation.

Some of those false hypotheses were very expensive. Especially those related to nutrition science.
"Maybe journals could issue "credits" for publishing replications of existing experiments, and require a researcher to "spend" a certain number of credits to publish an original paper?"

This would cripple small labs, unless people's startup packages come with potentially millions of dollars in funding to get their first few "credits".

It depends on the field and the policy would best be followed in an area like experimental psychology, where replication is not extremely costly (and where it might be an especially large program).