Y
Hacker News
new
|
ask
|
show
|
jobs
by
sp332
1141 days ago
For training, yes, but these models are optimized for inference, since inference will be run many more times than training. The original Llama models were run way past chinchilla-optimal amounts of data.