|
|
|
|
|
by thaumasiotes
3338 days ago
|
|
It means that you can't use the same data to confirm a hypothesis as you used to generate the hypothesis. Defensible statistical practice would be to throw anything you like at the original data set, come up with whatever ridiculous idea, and then collect a new data set for the purpose of investigating your ridiculous idea. The original data set provides zero[1] evidence for a hypothesis that it inspired you to think of. [1] Not really, but this is the cleanest way to sidestep multiple comparisons. |
|
(1) First, you can certainly have confidence in hypotheses based off single data sets. If you have a dataset with 1 million hours of TV watching that show 0 correlation between watching golf and watching Judge Judy, it's fine to suspect there's little correlation. You don't need to run a second study to have an informed opinion.
(2) Second, collecting new data sets (or equivalently blinding yourself to partitions) doesn't 100% fix the problem either. If you test lots of hypotheses against your test set, then the odds that some of them are false rises too. Creating third- and fourth- and fifth-level validation sets just keeps pushing the problem up the ladder. In fact, there's no real difference between the requirement to experimentally validate results and the requirement to have a hypothesis 'work' on both halves of a partitioned dataset. The data doesn't care when you collected it.
Ultimately we just have to admit that tests based on randomness are sometimes randomly wrong. There is no perfect silver bullet solution.