Hacker News new | ask | show | jobs
by SuchAnonMuchWow 795 days ago
No its the opposite: overfitting is the result of either having too many weights compared to the size of your dataset, or training for a long time while reusing/transforming parts of your dataset to make it last longer.

Having a huge dataset compared to the size of your network will reduce overfitting.

1 comments

We don't actually know how big the Dataset is, right? It could be the same dataset used for Llama 2, but trained for more Epochs.
The dataset is 7 times bigger than the dataset used for Llama 2 as reported by Meta.
Has Meta disclosed how much parts of the dataset were repeated? I've only seen the "number of tokens trained" number.