| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ummonk 1144 days ago
	Given inference costs and ability to run on devices, there's an argument to be made for training models that are smaller than Chinchilla-optimal though, especially if you can still eek out improved performance with longer training times.