|
|
|
|
|
by ColonelPhantom
1278 days ago
|
|
The problem described is not that you have to statically link your HIP kernels. (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA). The problem is that using machine code makes it machine-specific. So if I compile a HIP program for my gfx803 (RX 580) card, I won't be able to run the same binary on someone else's 6800 XT (gfx103x) system. (I think technically you can put both in a single binary, but that's still a terrible solution). CUDA instead ships NVPTX, which is an IR that can be compiled by the driver to machine code as long as the GPU has the right compute capability, similar to how it works in the graphics world (you submit your GLSL/HLSL source code or SPIR bytecode) to the driver which compiles it for the right GPU. Intel's oneAPI/Level Zero API ships SPIR-V, afaik (or maybe regular SPIR?). oneAPI can also work on top of OpenCL instead of L0. SPIR-V is neat because it's an open standard, so in theory you can get L0 working on non-Intel GPUs (and iirc Intel also uses it for e.g. FPGA's). But both SPIR-V and NVPTX solve the "machine-specific" problem AMD has. |
|
> (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).
Orochi sidesteps this problem... by only supporting NVRTC-style runtime compilation with C++ as input.
And even then, the HIP C++ compiler library is bundled as part of Orochi instead of being part of the app. This means that your app using Orochi will not run on a future GPU gen unless it's updated against a newer Orochi runtime.