|
|
|
|
|
by froonly
814 days ago
|
|
I have 40+ yrs of HPC/AI apps/performance engineering experience & I was one of the 1st people to port LAPACK and a number of other numerical libs to CUDA. Moreover, many of those major DoE + AI sites are my customers. You should not confuse AMD's general & long-standing indifference/incompetence wrt SW with the actual difficulty of providing a portable SW path for acceleration. As Woody Allen once said: "90% of success is showing up" But what happened in AI, when, in a very short period of time, almost everyone moved away from writing their directly in CUDA, to writing them in frameworks like Tensorflow & PyTorch is all the evidence anyone need to show just how unsound that SW obstacle is. |
|
Ah yes, pytorch:
1) Check issues, PRs, etc on torch Github. Considering market share ROCm has a multiple of the number of open and closed issues. There is still much work to be done for things as basic as overall stability.
2) torch is the bare minimum. Consider flash attention. On CUDA just runs of course with sliding window attention, ALiBi, and PagedAttention. ROCm fork? Nope. Then check out the xFormers situation on ROCm. Prepare to spend your time messing around with ROCm, spelunking GH issues/PRs/blogs, etc and going one by one through frameworks and libraries instead of `pip install` and actually doing your work.
3) Repeat for hundreds of libraries, frameworks, etc depending on your specific use case(s).
Then, once you have a model and need to serve it up for inference so your users can actually make use of it and you can get paid? With CUDA you can choose between torchserve, HF TEI/TGI, Nvidia Triton Inference Server, vLLM, and a number of others. vLLM has what I would call (at best) "early" support that requires patches to ROCm, isn't feature complete, and regularly has commits to fix yet another show-stopping bug/crash/performance regression/whatever.
Torch support is a good start but it's just that - a start.