Y
Hacker News
new
|
ask
|
show
|
jobs
by
alexedw
1021 days ago
This is silly. Look at the loss and benchmark curves for the Pythia suite of models - the smaller models certainly did saturate and in fact began worsening.
2T not saturating on a 7B is very different from 3T on a 1B.
1 comments
littlestymaar
1020 days ago
That's the point of the experiment actually…
link