| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by londons_explore 1262 days ago
	Slower training tends to be only a little cheaper, because most modern architectures parallelize well, and they just care about the number of flops. If you want to reduce cost, you need to reduce the model size, and you'll get worse results for less money.