| > Stopping an experiment once you find a significant effect but before you reach your predetermined sample size is classic P hacking. Although much of the article is basic common sense, and although I'm not a statistician, I had to seriously question the author's understanding of statistics at this point. The predetermined sample size (statistical power) is usually based on an assumption made about the effect size; if the effect size turns out to be much larger than you assumed, then a smaller sample size can be statistically sound. Clinical trials very frequently do exactly this -- stop before they reach a predetermined sample size -- by design, once certain pre-defined thresholds have been passed. Other than not having to spend extra time and effort, the reasons are at least twofold: first, significant early evidence of futility means you no longer have to waste patients' time; second, early evidence of utility means you can move an effective treatment into practice that much sooner. A classic example of this was with clinical trials evaluating the effect of circumcision on susceptibility to HIV infection; two separate trials were stopped early when interim analyses showed massive benefits of circumcision [0, 1]. In experimental studies, early evidence of efficacy doesn't mean you stop there, report your results, and go home; the typical approach, if the experiment is adequately powered, is to repeat it (three independent replicates is the informal gold standard). [0]: https://pubmed.ncbi.nlm.nih.gov/17321310/ [1]: https://pubmed.ncbi.nlm.nih.gov/16231970/ |
The author is absolutely correct. Early stopping is a classic form of p hacking. See attached image for an illustration.
If you want to be rigorous, you can define criterion for early stopping such that it's not, but you require relatively stronger evidence.
Clinical trials that stop early do so typically at predefined times with higher significance thresholds.