|
|
|
|
|
by m00x
564 days ago
|
|
Because the CPU has to load the model in parts for every cycle so you're spending a lot of time on IO and it offsets processing. You're talking about completely different things here. It's fine if you're doing a few requests at home, but if you're actually serving AI models, CUDA is the only reasonable choice other than ASICs. |
|