| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alexedw 1021 days ago
	This is silly. Look at the loss and benchmark curves for the Pythia suite of models - the smaller models certainly did saturate and in fact began worsening. 2T not saturating on a 7B is very different from 3T on a 1B.

1 comments

That's the point of the experiment actually…