Hacker News new | ask | show | jobs
by YeGoblynQueenne 2196 days ago
Bias and variance are characteristics of the model, not components of its error as I think you're saying. In the most simple sense, bias and variance refer to the shape of the function represented by the model (let's say "the shape of the model" for simplicity). A model with a more "rigid" shape (approaching a straight line) has more bias and one with a more "relaxed" shape (further from a straight line) has more variance.

The extent to which a model can extrapolate to out-of-sample data depends on how well the shape of the model follows the true distribution of the data. This is true regardless of the bias and variance of the model. It just happens that most of the time, in interesting, real-world problems, the true distribution of the data is more or less different than the sampling distribution of the training data- i.e. there's always some amount of "sampling error".

Sampling error can't be reduced by collecting more training data- you just have more data with the same sampling error. Increasing model complexity increases variance, so if you start with high sampling error, you wil get a high error on out-of-sample data because your model matches the "off" distribution of the training data too closely. What training with more data and with a more complex model can do is increase the ability of the trained model to interpolate, i.e. to accurately represent (new) data points that are in the same region of "instance space" as the training data points.

A high-bias model can extrapolate well if the sampling error is not too high and the shape of the true distribution is not too irregular. However, a high-bias model will also not interpolate as well as a high-variance model. Its rigid structure will "miss" many data points. Like you say, this will not change if you train with more data. Anyway, that's the tradeoff.

Now, the reason why deep neural nets, which are extremely high-variance models, are trained with large amounts of data, is that they can interpolate very well but can't extrapolate very well. If a model doesn't extrapolate very well but its training sample is a large enough chunk of instance space, it can still be very useful, because it's still representing a large number of instances.

How to put it? Mabye your high-variance model has seen examples of white dogs and black dogs in training, but no green dogs. Your model will not be able to generalise to green dogs, but if green dogs are rare, it will still be able to represent most dogs, so it's still useful.

Of course, looking at the output of a trained model (its behaviour) doesn't tell you anything about what it was trained on. So a model that has very high accuracy on a large number of tasks will look impressive, even if it can't generalise at all.