Hacker News new | ask | show | jobs
by leereeves 831 days ago
In smaller models, not having enough training data for the model size leads to overfitting. The model predicts the training data better than ever, but generalizes poorly and performs worse on new inputs.

Is there any reason to think the same thing wouldn't happen in billion parameter LLMs?

1 comments

This happens in smaller models because you reach parameter saturation very quickly. In modern LLMs and with current datasets, it is very hard to even reach this point, because the total compute time boils down to just a handful of epochs (sometimes even less than one). It would take tremendous resources and time to overtrain GPT4 in the same way you would overtrain convnets from the last decade.