Hacker News new | ask | show | jobs
by stiff 2680 days ago
A dishonest scientist can mine a dataset for statistically significant hypotheses and for a long time no institutional protection against it was in place:

https://en.wikipedia.org/wiki/Data_dredging

https://www.xkcd.com/882/

Machine learning makes it easier to test great many hypothesis, but even going fully "by hand" it is very easy to deviate from what the statistical framework of hypothesis testing would demand. There are now some discussions about counter-measures, e.g. about preregistration of studies:

http://www.sciencemag.org/news/2018/09/more-and-more-scienti...

You can see this as another chapter in the long debate about the correct way to test scientific hypotheses:

https://en.wikipedia.org/wiki/Statistical_hypothesis_testing...

1 comments

As your number of samples increase the chance that a hidden variable that explains the phenomenon but correlates with the thing you're testing also increases.

All experiments have a limit it seems