| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eszaq 1476 days ago

A specific form of bad research which could be at play here: The researchers programmatically searched the space of possible control variables to include until they came up with a model which maximized the apparent effect size for coffee, so they could publish an interesting & widely cited paper that looked good on their resumes.

With N different control variables to either include or ignore, that's 2^N possible sets of control variables. Odds are decent at least one of those regressions has a large effect size for coffee.

I would trust this sort of research more if instead of publishing a particular set of control variables obtained by an unspecified method, the researchers chose 100 of those 2^N possible sets of control variables at random, then published the average effect size from the 100 resulting regressions. Ideally they would make the code to reproduce this average effect size publicly available, so anyone could easily replicate using another 100 randomly generated regressions.