Hacker News new | ask | show | jobs
by elanning 1771 days ago
As I understand it, overfitting is low bias, but high variance. It's perfectly fitting 5 linear data points with a complex polynomial, when the underlying function was a line. Thus the polynomial doesn't generalize well to more data points not in the training set. Your model seems to be fitting points in the training set and the evaluation set just fine. Of course if different batman's were in the evaluation set, it would suddenly be doing terrible, but you can pretty much do that to every machine learning model. It wouldn't fit a lot of underlying assumptions of statistics and machine learning, eg i.i.d and evaluation sets/training sets being from the same distribution. Your definition of overfitting thus seems more like transfer learning, in some sense.
1 comments

I see what you are saying, but in that context then you lose what most people's intuitive definition of overfitting is. If I train a model on one image as my train set and then change one random pixel and run that model on this eval set then your argument would be that this is not overfitting because you are performing well on the eval set you created the model for.

My argument is that compared to models, as most people use them, micro-models are low bias and high variance, and thus overfit. That's why I set a distinction between a batman model and a batman micro-model.

On your target data your model is not high variance.

The way you use over-fitting is misleading. In fact, according to the article, the model is fit just right for its purpose. If it were fit any less, given the five pictures, it might not work at all. Your confusion arises because what you actually change is the objective and the DGP in question.

It should be clear to anyone that over-fitting and under-fitting is conceptually tied to the DGP under consideration. It makes no sense to speak of a model being "generally over fit" (!)

An "intuitive definition" of over-fitting that does not take into account this crucial fact will always be problematic.

For instance, if you train a model to have zero error, it does not imply it is over fit. If your training set is broad enough, and the production environment has the same exact underlying DGP, then the model is simply fit well. In practice, the training data is not the same as all the data coming from the latent DGP that the model eventually encounters. For that reason, such a model would be overfit.

However, in this case, the model does not seem to fail on any DGP that corresponds to the task: Identifying one type of Batman. It is therefore not overfit.

I am sorry, but op is right.

Fair enough, but target data in this sense IS a full distribution of Batmen. This approach is towards the goal of creating a broad dataset and fitting a full Batman model. We are training on a narrower subset of our actual target data and fitting to that narrow subset, whether you want to call that overfitting or not I suppose depends on your perspective.

I agree intuitive definitions are often murky, but given we are already throwing in murky notions of intention that are implicit in the word "target", I think an at least colloquial usage of overfitting is appropriate.

Sometimes we try micro-models on broader domains than what we expect they will work for, and they work fine. Sometimes not. The point is that the target here is not well defined because we are just using them as annotation tools with some human supervision and not in a "typical" production environment.