Hacker News new | ask | show | jobs
by minimaxir 902 days ago
> Wouldn’t we need more information on why they decided to stop training at this point to conclude that?

The experiment was fixed at 3 epochs on 1T tokens, they didn't decide to "stop" at a given criterion.

> we don’t know which without looking at validation loss, which is like a second set of test data the model hasn’t seen before.

The data I linked shows the validation loss, which has the same behavior as the training loss.

1 comments

I'd love to see someone go for another few epochs in the future. Two of the benchmarks got a significant jump almost at the end of training. I wonder if there's a chance for more of that - looks like an interesting effect on its own.
The jump was due to them fixing a bug. There’s a footnote about it on the bottom of page 5.

In the Discord, they mentioned a TinyLLaMa v2, presumably that would have this bug (and another bug, footnote page 4) fixed.