| I agree with kharms >You're describing P-value hacking Here's an example of what can happen when you take a huge corpus of data and throw an equally huge number of hypotheses at it to see what sticks:
https://io9.gizmodo.com/i-fooled-millions-into-thinking-choc... tl;dr: he "proved" chocolate causes weight loss by comparing chocolate- and non-chocolate-eaters on a very high number of health indicators. That also introduces the multiple testing problem: https://www.wikiwand.com/en/Multiple_comparisons_problem The more statistical tests you run against a set of data (EDIT: the more variables you test against a dataset), the higher the chance you get a statistically significant result from random error alone. |
There are really three solutions to the problem of multiple comparisons: Either (1) you use a different threshold, (2) you use a different test, and/or (3) you correctly interpret that p=5% does not imply the effect is 95% likely.
There's absolutely nothing wrong with exploring a data set, as long as you are responsible in the conclusions you draw.