| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sp332 1019 days ago
	Chinchilla predicts that you could get lower loss by training a larger model with that amount of data. But the model size in this case was chosen for other reasons, mostly speed of inference and cost of fine-tuning. So it's just irrelevant here.

1 comments

GaggiX 1019 days ago

Well it's relevant if you want to compare the model trained optimally using the same amount of compute and this one parameter-bound to see how much you're trading.

link