Hacker News new | ask | show | jobs
by CamperBob2 1277 days ago
When you basically have doomed humanity to rely on a single (malicious) company for a technology that is as important as AI

I don't disagree, but how did this argument fare against Microsoft? Is there a reason you expect it to fare better against Nvidia? That sweaty guy jumping around yelling "Developers! Developers! Developers!" had a point.

1 comments

Well I wouldn't have recommended building anything foundational on .NET either. But .NET is open source and runs almost everywhere now.

I would be fine with CUDA if Nvidia would allow anyone(AMD/Intel) to make implementations for their GPUs as well.

See ROCm HIP which is basically just that. AMD chose to rename all the function prefixes but it's what you are asking for here.

AMD fucked up by not having a stable IR between GPU generations and not having a public Windows SDK. But that's their own problem, not NVIDIA's.

> AMD fucked up by not having a stable IR between GPU generations

The lack of a stable IR is probably deliberate. Much like the "we won't support DLLs or pluggable APIs, only statically compiling it into your application" with FSR2, once you port to HIP you're locked in. AMD wants you working in HIP, compiling from HIP, not treating them as an IR - they don't want to be an alternate runtime for NVIDIA's ecosystem.

And again, much like FSR2, they are in fact willing to compromise end-user experience (no updates) or developer convenience (continual patching) in order to do it. No libraries, only distribute as source, ever.

It's not about library pluggability or runtime compatibility (after all GPU Ocelot already existed), what they want is you building the ROCm Ecosystem and not the CUDA Ecosystem or OneAPI Ecosystem.

That's understandable from a corporate strategy perspective, as a corporation you don't want to be building a product on someone else's platform, because that gives a lot of freedom for the platform owner to fuck with you. But like, the whole "we won't even do libraries/IR" is a little crass from a customer experience/developer experience perspective, and it kinda goes against the whole good-guy-AMD mythos they've built up.

The problem described is not that you have to statically link your HIP kernels. (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).

The problem is that using machine code makes it machine-specific. So if I compile a HIP program for my gfx803 (RX 580) card, I won't be able to run the same binary on someone else's 6800 XT (gfx103x) system. (I think technically you can put both in a single binary, but that's still a terrible solution).

CUDA instead ships NVPTX, which is an IR that can be compiled by the driver to machine code as long as the GPU has the right compute capability, similar to how it works in the graphics world (you submit your GLSL/HLSL source code or SPIR bytecode) to the driver which compiles it for the right GPU.

Intel's oneAPI/Level Zero API ships SPIR-V, afaik (or maybe regular SPIR?). oneAPI can also work on top of OpenCL instead of L0. SPIR-V is neat because it's an open standard, so in theory you can get L0 working on non-Intel GPUs (and iirc Intel also uses it for e.g. FPGA's). But both SPIR-V and NVPTX solve the "machine-specific" problem AMD has.

Old SPIR is dead (was an LLVM dialect), oneAPI L0 uses SPIR-V.

> (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).

Orochi sidesteps this problem... by only supporting NVRTC-style runtime compilation with C++ as input.

And even then, the HIP C++ compiler library is bundled as part of Orochi instead of being part of the app. This means that your app using Orochi will not run on a future GPU gen unless it's updated against a newer Orochi runtime.

> And even then, the HIP C++ compiler library is bundled as part of Orochi instead of being part of the app. This means that your app using Orochi will not run on a future GPU gen unless it's updated against a newer Orochi runtime.

Ugh. Leave it to AMD to make something that technically works but is an absolute nightmare.

IIRC this machine code nonsense is also the reason that GPU support is such an issue for AMD: to 'support' a chip, they need to bake binaries for that chip in all libraries. So to enable RDNA1, they'd need to ship RDNA1 code in all their libraries, which would make the install size balloon to crazy levels. At least Intel got it right.

I do believe that running oneAPI on AMD is possible, but it still needs HIP/ROCm? Wonder if it would be possible to bake a L0 backend for AMD that just uses SPIR-V like the Intel stuff does, side-stepping this issue entirely.

Frankly I wish AMD and Intel just started working together more on this stuff. Both of them stand to gain from a cross-vendor standard that works well.