| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ColonelPhantom 1278 days ago

The problem described is not that you have to statically link your HIP kernels. (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).

The problem is that using machine code makes it machine-specific. So if I compile a HIP program for my gfx803 (RX 580) card, I won't be able to run the same binary on someone else's 6800 XT (gfx103x) system. (I think technically you can put both in a single binary, but that's still a terrible solution).

CUDA instead ships NVPTX, which is an IR that can be compiled by the driver to machine code as long as the GPU has the right compute capability, similar to how it works in the graphics world (you submit your GLSL/HLSL source code or SPIR bytecode) to the driver which compiles it for the right GPU.

Intel's oneAPI/Level Zero API ships SPIR-V, afaik (or maybe regular SPIR?). oneAPI can also work on top of OpenCL instead of L0. SPIR-V is neat because it's an open standard, so in theory you can get L0 working on non-Intel GPUs (and iirc Intel also uses it for e.g. FPGA's). But both SPIR-V and NVPTX solve the "machine-specific" problem AMD has.

1 comments

my123 1278 days ago

Old SPIR is dead (was an LLVM dialect), oneAPI L0 uses SPIR-V.

> (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).

Orochi sidesteps this problem... by only supporting NVRTC-style runtime compilation with C++ as input.

And even then, the HIP C++ compiler library is bundled as part of Orochi instead of being part of the app. This means that your app using Orochi will not run on a future GPU gen unless it's updated against a newer Orochi runtime.

link

ColonelPhantom 1277 days ago

> And even then, the HIP C++ compiler library is bundled as part of Orochi instead of being part of the app. This means that your app using Orochi will not run on a future GPU gen unless it's updated against a newer Orochi runtime.

Ugh. Leave it to AMD to make something that technically works but is an absolute nightmare.

IIRC this machine code nonsense is also the reason that GPU support is such an issue for AMD: to 'support' a chip, they need to bake binaries for that chip in all libraries. So to enable RDNA1, they'd need to ship RDNA1 code in all their libraries, which would make the install size balloon to crazy levels. At least Intel got it right.

I do believe that running oneAPI on AMD is possible, but it still needs HIP/ROCm? Wonder if it would be possible to bake a L0 backend for AMD that just uses SPIR-V like the Intel stuff does, side-stepping this issue entirely.

Frankly I wish AMD and Intel just started working together more on this stuff. Both of them stand to gain from a cross-vendor standard that works well.

link

my123 1277 days ago

> So to enable RDNA1, they'd need to ship RDNA1 code in all their libraries

RDNA1? more like 3 binary slices. Navi10 (5700 XT), Navi12 (AWS G4ad) and Navi14 (5500 XT) require separate binaries!

> I do believe that running oneAPI on AMD is possible, but it still needs HIP/ROCm?

Yes, HIP runtimes for AMD GPUs rely on an underlying HIP implementation.

> Wonder if it would be possible to bake a L0 backend for AMD

Yes. But why would anybody not named AMD do that? It's AMD's hardware so AMD has to support it. OSS/hobbyists can only do so much.

> Frankly I wish AMD and Intel just started working together more on this stuff

Why? AMD truly does not care about GPGPU APIs for the masses. For their management it's a useless additional expense so they haven't been doing it.

A chunk of the community has wanted to consider AMD as an NV alternative for this, but AMD are not selling the same product. They think that their gaming GPU line is gaming centred w/ often bare minimum support for other markets if any, while NV cares about a much wider audience.

That's how the market ended up with: Q3 2022 Discrete GPU Market Share Report: NVIDIA Gains 88% Market Share Hold, AMD Now at 8% Followed By Intel at 4%.

link