| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by paulmd 1277 days ago

> AMD fucked up by not having a stable IR between GPU generations

The lack of a stable IR is probably deliberate. Much like the "we won't support DLLs or pluggable APIs, only statically compiling it into your application" with FSR2, once you port to HIP you're locked in. AMD wants you working in HIP, compiling from HIP, not treating them as an IR - they don't want to be an alternate runtime for NVIDIA's ecosystem.

And again, much like FSR2, they are in fact willing to compromise end-user experience (no updates) or developer convenience (continual patching) in order to do it. No libraries, only distribute as source, ever.

It's not about library pluggability or runtime compatibility (after all GPU Ocelot already existed), what they want is you building the ROCm Ecosystem and not the CUDA Ecosystem or OneAPI Ecosystem.

That's understandable from a corporate strategy perspective, as a corporation you don't want to be building a product on someone else's platform, because that gives a lot of freedom for the platform owner to fuck with you. But like, the whole "we won't even do libraries/IR" is a little crass from a customer experience/developer experience perspective, and it kinda goes against the whole good-guy-AMD mythos they've built up.

1 comments

ColonelPhantom 1277 days ago

The problem described is not that you have to statically link your HIP kernels. (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).

The problem is that using machine code makes it machine-specific. So if I compile a HIP program for my gfx803 (RX 580) card, I won't be able to run the same binary on someone else's 6800 XT (gfx103x) system. (I think technically you can put both in a single binary, but that's still a terrible solution).

CUDA instead ships NVPTX, which is an IR that can be compiled by the driver to machine code as long as the GPU has the right compute capability, similar to how it works in the graphics world (you submit your GLSL/HLSL source code or SPIR bytecode) to the driver which compiles it for the right GPU.

Intel's oneAPI/Level Zero API ships SPIR-V, afaik (or maybe regular SPIR?). oneAPI can also work on top of OpenCL instead of L0. SPIR-V is neat because it's an open standard, so in theory you can get L0 working on non-Intel GPUs (and iirc Intel also uses it for e.g. FPGA's). But both SPIR-V and NVPTX solve the "machine-specific" problem AMD has.

link

my123 1276 days ago

Old SPIR is dead (was an LLVM dialect), oneAPI L0 uses SPIR-V.

> (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).

Orochi sidesteps this problem... by only supporting NVRTC-style runtime compilation with C++ as input.

And even then, the HIP C++ compiler library is bundled as part of Orochi instead of being part of the app. This means that your app using Orochi will not run on a future GPU gen unless it's updated against a newer Orochi runtime.

link

ColonelPhantom 1276 days ago

> And even then, the HIP C++ compiler library is bundled as part of Orochi instead of being part of the app. This means that your app using Orochi will not run on a future GPU gen unless it's updated against a newer Orochi runtime.

Ugh. Leave it to AMD to make something that technically works but is an absolute nightmare.

IIRC this machine code nonsense is also the reason that GPU support is such an issue for AMD: to 'support' a chip, they need to bake binaries for that chip in all libraries. So to enable RDNA1, they'd need to ship RDNA1 code in all their libraries, which would make the install size balloon to crazy levels. At least Intel got it right.

I do believe that running oneAPI on AMD is possible, but it still needs HIP/ROCm? Wonder if it would be possible to bake a L0 backend for AMD that just uses SPIR-V like the Intel stuff does, side-stepping this issue entirely.

Frankly I wish AMD and Intel just started working together more on this stuff. Both of them stand to gain from a cross-vendor standard that works well.

link

my123 1276 days ago

> So to enable RDNA1, they'd need to ship RDNA1 code in all their libraries

RDNA1? more like 3 binary slices. Navi10 (5700 XT), Navi12 (AWS G4ad) and Navi14 (5500 XT) require separate binaries!

> I do believe that running oneAPI on AMD is possible, but it still needs HIP/ROCm?

Yes, HIP runtimes for AMD GPUs rely on an underlying HIP implementation.

> Wonder if it would be possible to bake a L0 backend for AMD

Yes. But why would anybody not named AMD do that? It's AMD's hardware so AMD has to support it. OSS/hobbyists can only do so much.

> Frankly I wish AMD and Intel just started working together more on this stuff

Why? AMD truly does not care about GPGPU APIs for the masses. For their management it's a useless additional expense so they haven't been doing it.

A chunk of the community has wanted to consider AMD as an NV alternative for this, but AMD are not selling the same product. They think that their gaming GPU line is gaming centred w/ often bare minimum support for other markets if any, while NV cares about a much wider audience.

That's how the market ended up with: Q3 2022 Discrete GPU Market Share Report: NVIDIA Gains 88% Market Share Hold, AMD Now at 8% Followed By Intel at 4%.

link