Hacker News new | ask | show | jobs
by cubefox 1141 days ago
The Chinchilla scaling law describes, apart from the training data size, the optimal number of parameters for a given amount of computing power for training. See

https://dynomight.net/scaling/

1 comments

For training, yes, but these models are optimized for inference, since inference will be run many more times than training. The original Llama models were run way past chinchilla-optimal amounts of data.