Hacker News new | ask | show | jobs
by JuettnerDistrib 1312 days ago
Great article. I have two comments:

1. Procrastination seems to be a type of early stopping. I knew I had a good strategy in school!

2. Something that seems to be sorely missing in machine learning (I'm not a ML expert) are error bars. If you take the example of the figure at the end, as you increase the number of parameters in the model, your error bars become larger (at least in the overfitting regime), and they are infinite when you have more parameters than data points. Indeed, chi^2 tests are usually used in physics/astro to test for this. Of course, you need error bars on the data points to do this. So perhaps the difficulty is really in assigning meaningful uncertainties to your pictures/test scores/politicians.

1 comments

> as you increase the number of parameters in the model, your error bars become larger

In large neural nets the effect is reversed. The larger the model, the better it generalises, even from the same training data.

> The larger the model, the better it generalises, even from the same training data

Do you have some references for this claim? For me, it seems counterintuitive.

It it very counterintuitive. It is also a very common observation that has taken everybody by surprise for almost 2 decades by now. At the beginning, people were very resistant to the idea, even when every experiment confirmed it.

The catch is that you need a huge amount of data to train those.

It also seems to have limits. There has been a few well documented cases where our current huge and very well trained kind of networks got errors there were lower than the rate of mislabeling of the data.

Can’t provide a reference, but I can confirm that this is common knowledge. It’s why e.g. GPT-3 outperforms GPT-2.

Though as stable diffusion shows, network architecture still matters a lot!

Note that the article points out you’ll get more overfitting as your number or parameters approaches that of the training set, which is what I suspect you’ve seen. The trend does reverse later on, but only once the parameter count is orders of magnitude beyond that point, and I don’t know if that ever happens outside of ML. It’s a lot of parameters.