| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joshuamorton 1768 days ago

I think you've specified this problem in a very strange way. But if you're saying that you're trying to train on the specific dataset where a = 1 and b = 1, then your model will fit the data perfectly with 175 billion parameters. It will also fit the data perfectly with, like, 15 parameters.

If you're trying to fit to some more complex space where a and b are unknown and you're given 3 numbers in the sequence, then what you're trying to fit is `f(a, b) = a + 2(b - a)` (or 2b - a, however you want to represent it), which is a swell function, but if you only give data that can be equally represented by `f(a, b) = b + 1`, you're mis-training your model.

But you could once again do that with a model with a dozen parameters. In both cases, the issue isn't overfitting, but misrepresentative data.

1 comments

karpierz 1768 days ago

I didn't specify the training set, just the test set. It's possible that your model actually models an arithmetic series. Or that it simply overfits. The point is that it doesn't require trillions of parameters to overfit to a trillion-sized test set.

link

joshuamorton 1768 days ago

What you need are more parameters than the complexity of the underlying distribution. If you drop to a linear function you're modelling, you only need a couple of parameters.

"Overfitting" is memorizing the training data instead of generalizing. The example you're providing isn't overfitting, it's just generalizing to the wrong function. Overfitting would be if the validation set was, say, 30 random values that you got right, but didn't get other values along the same lines correct.

> I didn't specify the training set, just the test set

Then unless you constructed the training set with the intent of mistraining the model, I think a training set that got good accuracy on that validation set would generalize.

> The point is that it doesn't require trillions of parameters to overfit to a trillion-sized test set.

You can't "overfit" a validation set, unless you've done something wrong. Overfitting is, by definition, learning the training set too well such that you fail to generalize to a validation set.

link

karpierz 1768 days ago

Overfitting is, by definition, learning a model that doesn't generalize to the distribution of inputs you care about. If your validation set has the same distribution as the inputs you care about, then your definition holds. But that's definitely not true in practice. Usually the data you collect won't be exactly representative of the conditions you're looking to test, unless your problem is very simple.

link

joshuamorton 1768 days ago

> Overfitting is, by definition, learning a model that doesn't generalize to the distribution of inputs you care about.

No, that's just mis-modelling. Overfitting is specifically doing so in a way that learns the training data too well, at the cost of generalizing. If you try and have a single layer perception network classify a nonlinear function, it will fail to generalize. But it certainly isn't "overfitting".

Overfitting is not the only form of mistake when training a model. You've presented a different one, which is just like trying to train on misrepresentative data. But that isn't "overfitting", it's just having bad data. Your model isn't "failing to generalize", it has nothing to generalize over.

The classic demonstration of this is that overfitting usually results in a accuracy curve that "frowns" on validation data. Your accuracy peaks, but then decreases as you learn the structure of the test data instead of the general structure. In your example that won't happen.

Training a model in the wrong problem isn't overfitting. In fact, your example is more like underfitting than overfitting. The model in your example would fail to see the full complexity of the structure, instead of as in overfitting, make it more complicated than reality.

link