|
|
|
|
|
by Straw
1018 days ago
|
|
The validation curves will look identical. These models are far too small to overfit to the training set. With a large enough model and many epochs, you can certainly get overfitting, but for one epoch val/train curves look exactly the same and I'd expect that a 7B model will never overfit on 2T tokens no matter how many epochs you do. |
|