In good old fashioned statistics there's the idea of the jackknife: for the i-th sample run a regression on all the data except i, and store statistics of interest (coefficients, predictions, etc). This gives you an ipso facto sampling distribution for the statistics of interest.
Similar and more common in econometrics is the bootstrap: run your model in like 1999 subsamples (with repetition) of the data and get sampling distributions.
With said sampling distributions, whether from the jackknife or the bootstrap, you're able to test whether your model is valid -- what's the probability that it'll have significant coefficients or an r2/mae/mape score indicating predictive capacity.
Cross-validation (and even scikit-learn is starting to default to five folds not three) is a "lazy" version of this. You don't get a sampling distribution but at least you're able to know that a given model appears good because it grips the data with all its might and doesn't work out-of-sample.
sklearn even offers the jackknife under some ML-y name like "one at a time scoring".
Yes, but that's not necessarily bad. You want a model that effectively captures the structure present in your dataset. There are currently only rules-of-thumb in model architecture, and it makes sense to explore the model space to determine which architecture and hyper parameters are suitable to the needs at hand. Two things save this from being a statistical sin: one, the final evaluation set is typically different than the validation set, and evaluation is only performed at the end of the 'fishing expedition', thus providing a reliable measure of the model's ability to generalize. Second, we're doing engineering here, not science, and our goal is to capture the structure of observations and not make a scientific claim about values of latent parameters.
In good old fashioned statistics there's the idea of the jackknife: for the i-th sample run a regression on all the data except i, and store statistics of interest (coefficients, predictions, etc). This gives you an ipso facto sampling distribution for the statistics of interest.
Similar and more common in econometrics is the bootstrap: run your model in like 1999 subsamples (with repetition) of the data and get sampling distributions.
With said sampling distributions, whether from the jackknife or the bootstrap, you're able to test whether your model is valid -- what's the probability that it'll have significant coefficients or an r2/mae/mape score indicating predictive capacity.
Cross-validation (and even scikit-learn is starting to default to five folds not three) is a "lazy" version of this. You don't get a sampling distribution but at least you're able to know that a given model appears good because it grips the data with all its might and doesn't work out-of-sample.
sklearn even offers the jackknife under some ML-y name like "one at a time scoring".