|
|
|
|
|
by karpierz
1767 days ago
|
|
Suppose that you train a neural network to predict the next number in an arithmetic sequence (a, a+b, a+2b, a+3b, a+4b, ...). As input it gets two numbers, the last number and the current number and has to predict the next one. Suppose you had 1.4 trillion examples in the following test set (using a model with 175 billion parameters): (1,2)->3 (2,3)->4 (3,4)->5 ... Do you think it is possible to overfit and score perfect on the test set, while failing to generalize? |
|
If you're trying to fit to some more complex space where a and b are unknown and you're given 3 numbers in the sequence, then what you're trying to fit is `f(a, b) = a + 2(b - a)` (or 2b - a, however you want to represent it), which is a swell function, but if you only give data that can be equally represented by `f(a, b) = b + 1`, you're mis-training your model.
But you could once again do that with a model with a dozen parameters. In both cases, the issue isn't overfitting, but misrepresentative data.