| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sp332 1141 days ago
	For training, yes, but these models are optimized for inference, since inference will be run many more times than training. The original Llama models were run way past chinchilla-optimal amounts of data.