Does anyone know what's the current state of AMD's tools to migrate from CUDA? There's so much untapped potential with these cards, it's crazy that basically only gamers can make use of their competitive prices
Last time I seriously checked (6 months ago or so) ROCm was still a far cry from CUDA. Set up was a mess, support was hit and miss, some operations were not particularly performante compared to the CUDA counterparts. Additionally, there are Tensorflow and probably PyTorch forks that should work with it, but they lag behind the official repositories quite a bit.
I hope that now that generative AI is becoming mainstream AMD steps up their game both on their consumer and professional lineups. If I were to buy a video card right now ( mostly for gaming+ML hobbies projects + running stable diffusion) I wouldn't pick AMD because I could do just 1/3 of my use cases properly without headaches (gaming).
Thankfully for a good chunk of number crunching that works fine. But the other side of the coin is notably AI workloads. There's no OpenCL or Vulkan standard for exposing matrix units, only vendor specific ones.
For OpenCL: cl_qcom_ml_ops (Qualcomm) notably,
for Vulkan: VK_NV_cooperative_matrix (NVIDIA)
Do you have an opinion on the new openCL implementation that recently got merged into mesa? It doesn't touch on tooling or the other points you mentioned, but performance seems to be pretty good!
What do you mean by a polyglot compiler infrastructure? Are you referring to the fact that CUDA source is single-file (your host and device code are in the same compilation unit?) Or do you mean that you can ship the same binary to different GPU architectures?
SYCL solves the first issue, and SPIR-V solves the second one. (OpenCL mostly avoids the issue in general though by making you ship source which is then compiled by the driver, but SPIR-V allows you to ship a 'binary' instead).
No clue as for debugging and IDE tooling, but I did find a rocgdb binary on my Linux ROCm installation (which is for HIP, not SYCL). No clue what oneAPI offers for debugging.
Furthermore, Clang (and hence clangd) speaks HIP and I think SYCL too. So the non-runtime IDE tooling should work.
Finally, a lot of GPU libraries are I think available for ROCm/HIP too. It's unfortunate that the HIP stack sucks enormously in other ways.
I understand AMD HiP is a CUDA clone, where library functions have the same syntax but with hip replacing cuda in the function names.
Behind, it can use AMD and NVIDIA hardware alike. Thus, the idea is that through typically negligible effort porting to HiP, your code becomes vendor-independent.
Then comes the problem of AMD not supporting ROCm HIP on most of their hardware or user base.
On Windows, the ROCm HIP SDK is private and only available under NDA. This means that while you can use Blender w/ HIP on Windows, the Blender builds that you compile yourself will not be able to use ROCm HIP.
On Linux, the supported GPUs are few and far between, Vega20 onwards are supported today. APUs, RDNA1, and lower end RDNA2 w/o unsupported hacks (6700 XT and below) are excluded.
It's quite baffling. AMD is behaving like an incumbent trying to segment users etc. when they really should behave more like an upstart trying to make things easy. But their drivers for Linux are the best, so I don't think I'll switch to Nvidia...
> What a lot of SW codebases did to support AMD (see PyTorch code notably): codebase is still CUDA, have the conversion pass to HIP done at build time.
This is sort of echoed in AMD's stance on FSR2/upscaling, where they have explicitly stated they will not support any API that allows plugging, regardless of whether the API is open-source or not, or who owns it, because a pluggable API might allow plugging proprietary implementations. Their opinion is their solution is what's best for everyone, in every situation, so you don't really need DLLs or pluggability because why would you want to plug something worse? FSR2 is the best for everyone and you should really just be compiling it directly into your application.
(this of course also makes it impossible to update versions of FSR2 if better ones come out subsequently - you can't do the DLSS thing where you swap in newer DLLs (at your own risk of course) and benefit from later improvements to a modular grouping of code. You know, sort of the whole concept of libraries in the first place...)
The HIP stuff is the same thing... AMD really wants you to convert once to HIP and be locked in forever, because Theirs Is The Best, Why Would You Need Anything Else? But of course HIP doesn't have a PTX-like concept so you really need to distribute as source and compile everything at runtime... because who would want library code or dynamic linking?
Anyway, like, I know it's not really a shocker but the "we love open-source!" thing is a bit of an act. They love it when it's an angle for them as the underdog to leverage their way into marketshare... and as the underdog when it's not favorable for them (like FSR) they'll abandon their pro-freeness stance. And they too have their closed, proprietary technologies (like their CXL alternative that only works with their CPU+peripherals and nobody else can use) that they don't open up either. Nor is AMD racing to open up chipsets (like the NForce or Abit days) either, that's all locked down and proprietary too.
I know that's not really a shocker when you put it like that, but, AMD really gets a ton of the benefit-of-the-doubt all the time. They have on multiple occasions shipped defective/marginal silicon at launch for example, and it all just gets brushed over and people forget all about it. Both Zen2 (low-quality silicon in the launch batch meant chips were missing advertised boost clocks by 10%+) and RDNA1 had massive incidents of the community downplaying very real problems because AMD Is Good Now, many of the affected users never had their problems resolved and they just kinda sighed and lived with it or sold the hardware and bought something better, and the fans swept it all under the rug and never talked of it again. Same for pandemic profiteering (while Intel cut prices), etc. There's just a ton of shit that people bend over backwards to find justifications for with AMD that just wouldn't fly with more reputable vendors.
A big part of the reason is that Blender on Nvidia supports hardware accelerated ray tracing using OptiX. HIP-RT exists, but is not used in Blender yet. I think the Intel oneAPI backend for Arc GPUs also misses RT acceleration.
AMD claims to have HIP-RT working internally, but not yet suitable for posting publically. Intel is planning it, I think. Both should land around Blender 3.6, if I'm not mistaken.
If you take the raw FLOPS, CUDA (not OptiX) and HIP are actually nearly equivalent in performance last I remember. I think RDNA2 just does "more with less", at least in terms of gaming performance per FLOP (e.g. due to the huge cache).
I hope that now that generative AI is becoming mainstream AMD steps up their game both on their consumer and professional lineups. If I were to buy a video card right now ( mostly for gaming+ML hobbies projects + running stable diffusion) I wouldn't pick AMD because I could do just 1/3 of my use cases properly without headaches (gaming).