Hacker News new | ask | show | jobs
by phreeza 2197 days ago
The bias/variance trade-off is not really related to extrapolation. Think of a point cloud following a quadratic shape. A linear model will extrapolate terribly.
1 comments

Well, "more predictive" doesn't mean it's a perfect fit. Every model has error. A line through a point cloud curving upwards will still represent some of the points in the cloud. So it will have high error, but it's still a representation of the data.

And yes, the bias-variance tradeoff is about generalisation (i.e. the ability to extrapolate to unseen data). But this is more related to the fact that in the real world, problem spaces don't have nice, friendly, regular shapes nor do their shapes stay put after we've trained a model.

My understanding is that generally, the error when extrapolating to areas not covered by the training data distribution would be considered to be part of the "bias" part of the bias-variance tradeoff.

The way I see it, the variance is the part of the error that you can reduce by collecting more data from your distribution and increasing model complexity if needed.

The bias part is what will not get better no matter how much you sample your distribution, and extrapolation problems fall into that category.

>> The way I see it, the variance is the part of the error that you can reduce by collecting more data from your distribution and increasing model complexity if needed.

Ah, apologies, I see what you mean. That is true, but this "error" is in-sample error, so increasing your model's variance will increase its ability to interpolate but not extrapolate to out-of-sample data, as I explain in my longer comment.

"In-sample" means all the data you've collected to train and test with. It includes training/validation/test splits. At the end of k-fold cross-validation, your model has "seen" all the data in your sample and the model that performs best is the model that best represents that data.

But, because the data was sampled from a distribution that is most likely not the true distribution of the data (since that distribution is unknown), the sampling error (i.e. the differences between the true and sample distributions) will be reflected in the model. A high-variance model will suffer more from this than a high-bias one.

Sorry I didn't understand immediately what you meant. The longer comment above is correct but probably doesn't help answer your question directly.

Thanks for taking the time to write the detailed reponses. Definitely led me to think more closely about these vaguely held intuitions about bias and variance! I think you are exactly right that the crucial aspect is the variance when looking at out-of-sample predictions, not just across several samplings from the original training distribution (a la k-fold crossvalidation).
Bias and variance are characteristics of the model, not components of its error as I think you're saying. In the most simple sense, bias and variance refer to the shape of the function represented by the model (let's say "the shape of the model" for simplicity). A model with a more "rigid" shape (approaching a straight line) has more bias and one with a more "relaxed" shape (further from a straight line) has more variance.

The extent to which a model can extrapolate to out-of-sample data depends on how well the shape of the model follows the true distribution of the data. This is true regardless of the bias and variance of the model. It just happens that most of the time, in interesting, real-world problems, the true distribution of the data is more or less different than the sampling distribution of the training data- i.e. there's always some amount of "sampling error".

Sampling error can't be reduced by collecting more training data- you just have more data with the same sampling error. Increasing model complexity increases variance, so if you start with high sampling error, you wil get a high error on out-of-sample data because your model matches the "off" distribution of the training data too closely. What training with more data and with a more complex model can do is increase the ability of the trained model to interpolate, i.e. to accurately represent (new) data points that are in the same region of "instance space" as the training data points.

A high-bias model can extrapolate well if the sampling error is not too high and the shape of the true distribution is not too irregular. However, a high-bias model will also not interpolate as well as a high-variance model. Its rigid structure will "miss" many data points. Like you say, this will not change if you train with more data. Anyway, that's the tradeoff.

Now, the reason why deep neural nets, which are extremely high-variance models, are trained with large amounts of data, is that they can interpolate very well but can't extrapolate very well. If a model doesn't extrapolate very well but its training sample is a large enough chunk of instance space, it can still be very useful, because it's still representing a large number of instances.

How to put it? Mabye your high-variance model has seen examples of white dogs and black dogs in training, but no green dogs. Your model will not be able to generalise to green dogs, but if green dogs are rare, it will still be able to represent most dogs, so it's still useful.

Of course, looking at the output of a trained model (its behaviour) doesn't tell you anything about what it was trained on. So a model that has very high accuracy on a large number of tasks will look impressive, even if it can't generalise at all.