Hacker News new | ask | show | jobs
by epistasis 4607 days ago
The vast majority of scientific papers are not single experiments with one p-vaule, but rather a handful experiments to a dozen or more experiments, only some of which may be reduced to a p-value. And in most biological research, at least two lines of evidence are required before a reviewer will accept a claim (e.g. "OK, you may have found something, now verify it with a PCR.").

So this entire setup is just kind of crap, and not representative of scientific research.

In addition, this simple point, which is quite interesting, and necessary to keep in mind when interpreting multiple p-values, is widely acknowledged in the field, which is why False Discovery Rate methods started to be used as far back as the 90s. This initial point was first published as a "The sky is falling, what are all you idiot medical researchers doing?!" type of paper by Ioannidis, which is a great way to make a name for oneself. However, even his own interpretation did not hold up well, and he has stopped pushing the point. Summarizing an extensive comment on Metafilter [1]

>Why Most Published Research Findings Are False: Problems in the Analysis >The article published in PLoS Medicine by Ioannidis makes the dramatic claim in the title that “most published research claims are false,” and has received extensive attention as a result. The article does provide a useful reminder that the probability of hypotheses depends on much more than just the p-value, a point that has been made in the medical literature for at least four decades, and in the statistical literature for decades previous. This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies.Unfortunately, while we agree that there are more false claims than many would suspect—based both on poor study design, misinterpretation of p-values, and perhaps analytic manipulation—the mathematical argument in the PLoS Medicine paper underlying the “proof” of the title's claim has a degree of circularity. As we show in detail in a separately published paper, Dr. Ioannidis utilizes a mathematical model that severely diminishes the evidential value of studies—even meta-analyses—such that none can produce more than modest evidence against the null hypothesis, and most are far weaker. This is why, in the offered “proof,” the only study types that achieve a posterior probability of 50% or more (large RCTs [randomized controlled trials] and meta-analysis of RCTs) are those to which a prior probability of 50% or more are assigned. So the model employed cannot be considered a proof that most published claims are untrue, but is rather a claim that no study or combination of studies can ever provide convincing evidence.

>ASSESSING THE UNRELIABILITY OF THE MEDICAL LITERATURE: A RESPONSE TO "WHY MOST PUBLISHED RESEARCH FINDINGS ARE FALSE" >A recent article in this journal (Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124) argued that more than half of published research findings in the medical literature are false. In this commentary, we examine the structure of that argument, and show that it has three basic components: >1) An assumption that the prior probability of most hypotheses explored in medical research is below 50%. >2) Dichotomization of P-values at the 0.05 level and introduction of a “bias” factor (produced by significance-seeking), the combination of which severely weakens the evidence provided by every design. >3) Use of Bayes theorem to show that, in the face of weak evidence, hypotheses with low prior probabilities cannot have posterior probabilities over 50%. >Thus, the claim is based on a priori assumptions that most tested hypotheses are likely to be false, and then the inferential model used makes it impossible for evidence from any study to overcome this handicap. We focus largely on step (2), explaining how the combination of dichotomization and “bias” dilutes experimental evidence, and showing how this dilution leads inevitably to the stated conclusion. We also demonstrate a fallacy in another important component of the argument –that papers in “hot” fields are more likely to produce false findings. We agree with the paper’s conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims. But calculating the unreliability of the medical research literature, in whole or in part, requires more empirical evidence and different inferential models than were used. The claim that “most research findings are false for most research designs and for most fields” must be considered as yet unproven.

[1] http://www.metafilter.com/133102/There-is-no-cost-to-getting...

1 comments

What's infuriatingly ignored is that in that very same PLoS Medicine issue is a response to Ioannidis' work by Greenland, IIRC, that notes that by "False" he means the significance is wrong, but what's really of interest is the effect measure.

On a meta level, I've always wondered why we take a paper about most findings being false as clearly correct.

It's true that effect sizes are often more important, but it's also true that they're also often incorrect. See e.g.

Ioannidis, J. P. A. (2008). Why Most Discovered True Associations Are Inflated. Epidemiology, 19(5), 640–648. doi:10.1097/EDE.0b013e31818131e7

Most studies are underpowered and are incapable of detecting the true effect. Only if they get lucky and observe an abnormally large effect will they obtain a statistically significant result, so the published results tend to be significant overestiates.

For another good example, see

Gelman, A., & Weakliem, D. (2009). Of beauty, sex, and power: statistical challenges in estimating small effects. American Scientist, 97, 310–316.

http://www.stat.columbia.edu/~gelman/research/unpublished/po...

I think part of the point there is not to pass effect estimates through a significance test filter first. Most studies are underpowered to detect a true effect at alpha = 0.05. That doesn't actually suggest that most studies are wrong as much as if a study is underpowered and doesn't find a significant finding, we assert its dull and uninteresting.

Ironically, the Ioannidis paper is in Epidemiology, which is a journal that is fairly anti-significance testing, but where I still get reviewers suggesting that an effect measure with a confidence interval that brushes against the null must mean nothing at all.

On a meta level, I've always wondered why we take a paper about most findings being false as clearly correct.

This is a fair question. I think the reasons the Ioannidis paper was persuasive are that

1) Ioannidis replicated earlier results about the lack of replication of most research reports,

and

2) Ioannidis "showed the work" for how possible, and indeed likely, it is for an effect size that permits a false-positive finding to be published, under reasonable assumptions about the prevalence of false-positive findings and publishing practices. Most scientists were vaguely aware of lack of replication years before anyone heard of Ioannidis, but not many scientists were fully aware of how readily a false-positive finding can be published.

>On a meta level, I've always wondered why we take a paper about most findings being false as clearly correct.

Because in science, not believing things is the default state. If you say, "most published findings are false", you're really saying, "most of the time we have to accept the null hypothesis", which is what we all not-so-secretly believe regarding everything, all the time, in any case.