|
|
|
|
|
by minimaxir
902 days ago
|
|
> Wouldn’t we need more information on why they decided to stop training at this point to conclude that? The experiment was fixed at 3 epochs on 1T tokens, they didn't decide to "stop" at a given criterion. > we don’t know which without looking at validation loss, which is like a second set of test data the model hasn’t seen before. The data I linked shows the validation loss, which has the same behavior as the training loss. |
|