|
|
|
|
|
by vegabook
732 days ago
|
|
Summary: AMD works if you spend 500m USD+ with them. Then they'll throw an army of their own software engineers into the contract who will hold your hand every step of the way, and remove all the jank for you. By contrast, since at least 10 years ago, I could buy any GTX card and CUDA worked out of the box, and that applied right down to a $99 Jetson Nano. AMD's strategy looks a lot like IBM's mainframe strategy of the 80s. And that didn't go well. |
|
The customers at the national labs are not going to be sharing custom HPC code with AMD engineers, if for no other reason than security clearances. Nuclear stockpile modeling code, or materials science simulations are not being shared with some SWE at AMD. AMD is not “removing jank”, for these customers. It’s that these customers don’t need a modern DL stack.
Let’s not pretend like CUDA works/has always worked out of the box. There’s forced obsolescence (“CUDA compute capability”). CUDA didn’t even have backwards compatibility for minor releases (.1,.2, etc.) until version 11.0. The distinction between CUDA, CUDA toolkit, CUDNN, and the actual driver is still inscrutable to many new devs (see the common questions asked on r/localLlama and r/StableDiffusion).
Directionally, AMD is trending away from your mainframe analogy.
The first consumer cards got official ROCm support in 5.0. And you have been able to run real DL workloads on budget laptop cards since 5.4 (I’ve done so personally). Developer support is improving (arguably too slowly), but it’s improving. Hugging Face, Cohere, MLIR, Lamini, PyTorch, TensorFlow, DataBricks, etc all now have first party support for ROCm.