|
|
|
|
|
by dodobirdlord
2317 days ago
|
|
The math continues to work out as long as you use the right approach. You have to collect twice as much data, and then set half of it aside at random without examining it. Then you can do whatever perverse p-hacking multi-modeling curve-fitting whatever to the half you kept until you reach a hypothesis, then check it against the half you set aside to recover the statistical significance you lost by using techniques that may have overfit the first half. Unsurprisingly, the math works out because this approach is isomorphic to collecting the first half, studying it to form a hypothesis, then conducting a proper pre-hypothesized experiment to collect the second half. Validation via holdout sets is the same approach used in machine learning and elsewhere to prevent models from overfitting data. |
|