| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zone411 1277 days ago
	No. It shows the opposite. All model sizes converged to a similar loss as the compute increased towards maximum. But larger models had larger loss for a given compute budget. Their text about Figure 3 confirms what I'm saying: "We find a clear valley in loss, meaning that for a given FLOP budget there is an optimal model to train"

1 comments

peaslock 1274 days ago

Yes, but the losses in Figure 3 increase because the larger models see fewer data to keep the FLOP budget constant, not because of overfitting. Large models do not overfit very much, so the loss of a larger model will still be better compared to a smaller model when you keep dataset size constant.

link