|
|
|
|
|
by qeternity
819 days ago
|
|
This is not CUDA's moat. That is on the R&D/training side. Inference side is partly about performance, but mostly about cost per token. And given that there has been a ton of standardization around LLaMA architectures, AMD/ROCm can target this much more easily, and still take a nice chunk of the inference market for non-SOTA models. |
|