| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ryao 514 days ago
	When running inference workloads via something like llama.cpp, only 1 GPU is ever used at a time, so you would have 1 active GPU and 4 idle GPUs. That should make the power usage less insane in practice than you expect.