| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by carderne 241 days ago
	Layman understanding: Because as a function of hardware and electricity costs, a “cloud” GPU will be many times more efficient per output token. You aren’t loading/offloading models and don’t have any parts of the GPU waiting for input. Everything is fully saturated always.