| Bias and variance are characteristics of the model, not components of its
error as I think you're saying. In the most simple sense, bias and variance
refer to the shape of the function represented by the model (let's say "the
shape of the model" for simplicity). A model with a more "rigid" shape
(approaching a straight line) has more bias and one with a more "relaxed"
shape (further from a straight line) has more variance. The extent to which a model can extrapolate to out-of-sample data depends on
how well the shape of the model follows the true distribution of the data.
This is true regardless of the bias and variance of the model. It just happens
that most of the time, in interesting, real-world problems, the true
distribution of the data is more or less different than the sampling
distribution of the training data- i.e. there's always some amount of
"sampling error". Sampling error can't be reduced by collecting more training data- you just
have more data with the same sampling error. Increasing model complexity
increases variance, so if you start with high sampling error, you wil get a
high error on out-of-sample data because your model matches the "off"
distribution of the training data too closely. What training with more data
and with a more complex model can do is increase the ability of the trained
model to interpolate, i.e. to accurately represent (new) data points that
are in the same region of "instance space" as the training data points. A high-bias model can extrapolate well if the sampling error is not too high
and the shape of the true distribution is not too irregular. However, a
high-bias model will also not interpolate as well as a high-variance model. Its
rigid structure will "miss" many data points. Like you say, this will not
change if you train with more data. Anyway, that's the tradeoff. Now, the reason why deep neural nets, which are extremely high-variance
models, are trained with large amounts of data, is that they can interpolate
very well but can't extrapolate very well. If a model doesn't extrapolate very
well but its training sample is a large enough chunk of instance space, it can
still be very useful, because it's still representing a large number of
instances. How to put it? Mabye your high-variance model has seen examples of white dogs
and black dogs in training, but no green dogs. Your model will not be able to
generalise to green dogs, but if green dogs are rare, it will still be able to
represent most dogs, so it's still useful. Of course, looking at the output of a trained model (its behaviour) doesn't
tell you anything about what it was trained on. So a model that has very high
accuracy on a large number of tasks will look impressive, even if it can't
generalise at all. |