| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Silverback_VII 1141 days ago
	I'm not sure whether the number of parameters serves as a reliable measure of quality. I believe that these models have a lot of redundant computation and could be a lot smaller without losing quality.

1 comments

cubefox 1141 days ago

The Chinchilla scaling law describes, apart from the training data size, the optimal number of parameters for a given amount of computing power for training. See

https://dynomight.net/scaling/

link

sp332 1141 days ago

For training, yes, but these models are optimized for inference, since inference will be run many more times than training. The original Llama models were run way past chinchilla-optimal amounts of data.

link