Hacker News new | ask | show | jobs
by pama 861 days ago
Yes it does. Think of typical notions of statistical significance when testing one new idea prospectively, say the concept of a p-value, or the AUC used in the paper. Now think instead of a rich dataset and you are free to fish for any of the possibly tens of thousands of signals for one signal or a combination of signals that match your result. Loosely speaking you are overfitting and the threshold for being surprised or having statistical significance is now much more strict.

https://en.wikipedia.org/wiki/Bonferroni_correction

1 comments

Sure, but let's say that we test this and it is predictive on new data (not overfitting), but we have no idea at all how it works. It's still a useful test.
The retrospective regression on a specific dataset might discover a true correlated quantity, if any true correlated quantities were there and their signal was more prominent than the combinations you get from the noise. However, this analysis will always discover a quantity that correlates, by design. These retrospective studies can prompt prospective studies for a correlated quantity (a biomarker in this case) and the careful analysis of the retrospective study methodologies and results can suggest the design of such prospective studies; if a prospective study works, then that is fantastic. The retrospective studies are mostly there for statisticians to figure things out for future tests, except when the signal is simple and phenomenal.