| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cubefox 1141 days ago
	The Chinchilla scaling law describes, apart from the training data size, the optimal number of parameters for a given amount of computing power for training. See https://dynomight.net/scaling/

1 comments

sp332 1141 days ago

For training, yes, but these models are optimized for inference, since inference will be run many more times than training. The original Llama models were run way past chinchilla-optimal amounts of data.

link