Hacker News new | ask | show | jobs
by RC_ITR 1016 days ago
Not to be a downer, but wasn’t one of OpenAI’s earliest discoveries that training small models on huge datasets leads to over-fitting?

It’s my understanding that the entire race to ever-more parameters was driven by that.

2 comments

A workaround to overfitting is to train on so much distinct data that the model can't overfit.

Newer large datasets like the ones used here optimize for diversity. (e.g. SlimPajama is a heavily-deduped dataset)

Learn about the magic of double descent
https://openai.com/research/deep-double-descent

Yeah, the line keeps going down as the model gets bigger. What's your point? That there's a hump in the middle?