|
|
|
|
|
by phreeza
2200 days ago
|
|
My understanding is that generally, the error when extrapolating to areas not covered by the training data distribution would be considered to be part of the "bias" part of the bias-variance tradeoff. The way I see it, the variance is the part of the error that you can reduce by collecting more data from your distribution and increasing model complexity if needed. The bias part is what will not get better no matter how much you sample your distribution, and extrapolation problems fall into that category. |
|
Ah, apologies, I see what you mean. That is true, but this "error" is in-sample error, so increasing your model's variance will increase its ability to interpolate but not extrapolate to out-of-sample data, as I explain in my longer comment.
"In-sample" means all the data you've collected to train and test with. It includes training/validation/test splits. At the end of k-fold cross-validation, your model has "seen" all the data in your sample and the model that performs best is the model that best represents that data.
But, because the data was sampled from a distribution that is most likely not the true distribution of the data (since that distribution is unknown), the sampling error (i.e. the differences between the true and sample distributions) will be reflected in the model. A high-variance model will suffer more from this than a high-bias one.
Sorry I didn't understand immediately what you meant. The longer comment above is correct but probably doesn't help answer your question directly.