Y
Hacker News
new
|
ask
|
show
|
jobs
by
ryao
514 days ago
When running inference workloads via something like llama.cpp, only 1 GPU is ever used at a time, so you would have 1 active GPU and 4 idle GPUs. That should make the power usage less insane in practice than you expect.