|
|
|
|
|
by carderne
241 days ago
|
|
Layman understanding: Because as a function of hardware and electricity costs, a “cloud” GPU will be many times more efficient per output token. You aren’t loading/offloading models and don’t have any parts of the GPU waiting for input. Everything is fully saturated always. |
|