Hacker News new | ask | show | jobs
by visarga 1312 days ago
> as you increase the number of parameters in the model, your error bars become larger

In large neural nets the effect is reversed. The larger the model, the better it generalises, even from the same training data.

1 comments

> The larger the model, the better it generalises, even from the same training data

Do you have some references for this claim? For me, it seems counterintuitive.

It it very counterintuitive. It is also a very common observation that has taken everybody by surprise for almost 2 decades by now. At the beginning, people were very resistant to the idea, even when every experiment confirmed it.

The catch is that you need a huge amount of data to train those.

It also seems to have limits. There has been a few well documented cases where our current huge and very well trained kind of networks got errors there were lower than the rate of mislabeling of the data.

Can’t provide a reference, but I can confirm that this is common knowledge. It’s why e.g. GPT-3 outperforms GPT-2.

Though as stable diffusion shows, network architecture still matters a lot!

Note that the article points out you’ll get more overfitting as your number or parameters approaches that of the training set, which is what I suspect you’ve seen. The trend does reverse later on, but only once the parameter count is orders of magnitude beyond that point, and I don’t know if that ever happens outside of ML. It’s a lot of parameters.