|
|
|
|
|
by leereeves
831 days ago
|
|
In smaller models, not having enough training data for the model size leads to overfitting. The model predicts the training data better than ever, but generalizes poorly and performs worse on new inputs. Is there any reason to think the same thing wouldn't happen in billion parameter LLMs? |
|