Hacker News new | ask | show | jobs
by rowanG077 1277 days ago
I'm not claiming it's not important and it's not very nice that you say I did.

When you basically have doomed humanity to rely on a single (malicious) company for a technology that is as important as AI. Then maybe, just maybe, the trade off that it is a harder to implement is worth it.

3 comments

The tradeoff was not that. The OpenCL SW ecosystem was just not there at all. It's not a coincidence that nobody has a good AI training on OpenCL stack even today. The cross-vendor infrastructure for that doesn't exist.

And NV was far from malicious here, they are who made building this ecosystem possible.

Without NV what would have plausibly happened was not having AI training on GPUs at all, but on bespoke accelerators (which _did_ exist back then) at a totally inaccessible cost to customers. It's hard to understate their role in building this ecosystem.

What exactly is the issue? I use OpenCl without significant issues everyday.
The whole library ecosystem, for example (but far from only) if you want a BLAS. OpenCL only provides much lower level infrastructure bricks.

With CUDA having unmatched performance compared to alternatives too.

When you basically have doomed humanity to rely on a single (malicious) company for a technology that is as important as AI

I don't disagree, but how did this argument fare against Microsoft? Is there a reason you expect it to fare better against Nvidia? That sweaty guy jumping around yelling "Developers! Developers! Developers!" had a point.

Well I wouldn't have recommended building anything foundational on .NET either. But .NET is open source and runs almost everywhere now.

I would be fine with CUDA if Nvidia would allow anyone(AMD/Intel) to make implementations for their GPUs as well.

See ROCm HIP which is basically just that. AMD chose to rename all the function prefixes but it's what you are asking for here.

AMD fucked up by not having a stable IR between GPU generations and not having a public Windows SDK. But that's their own problem, not NVIDIA's.

> AMD fucked up by not having a stable IR between GPU generations

The lack of a stable IR is probably deliberate. Much like the "we won't support DLLs or pluggable APIs, only statically compiling it into your application" with FSR2, once you port to HIP you're locked in. AMD wants you working in HIP, compiling from HIP, not treating them as an IR - they don't want to be an alternate runtime for NVIDIA's ecosystem.

And again, much like FSR2, they are in fact willing to compromise end-user experience (no updates) or developer convenience (continual patching) in order to do it. No libraries, only distribute as source, ever.

It's not about library pluggability or runtime compatibility (after all GPU Ocelot already existed), what they want is you building the ROCm Ecosystem and not the CUDA Ecosystem or OneAPI Ecosystem.

That's understandable from a corporate strategy perspective, as a corporation you don't want to be building a product on someone else's platform, because that gives a lot of freedom for the platform owner to fuck with you. But like, the whole "we won't even do libraries/IR" is a little crass from a customer experience/developer experience perspective, and it kinda goes against the whole good-guy-AMD mythos they've built up.

The problem described is not that you have to statically link your HIP kernels. (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).

The problem is that using machine code makes it machine-specific. So if I compile a HIP program for my gfx803 (RX 580) card, I won't be able to run the same binary on someone else's 6800 XT (gfx103x) system. (I think technically you can put both in a single binary, but that's still a terrible solution).

CUDA instead ships NVPTX, which is an IR that can be compiled by the driver to machine code as long as the GPU has the right compute capability, similar to how it works in the graphics world (you submit your GLSL/HLSL source code or SPIR bytecode) to the driver which compiles it for the right GPU.

Intel's oneAPI/Level Zero API ships SPIR-V, afaik (or maybe regular SPIR?). oneAPI can also work on top of OpenCL instead of L0. SPIR-V is neat because it's an open standard, so in theory you can get L0 working on non-Intel GPUs (and iirc Intel also uses it for e.g. FPGA's). But both SPIR-V and NVPTX solve the "machine-specific" problem AMD has.

Old SPIR is dead (was an LLVM dialect), oneAPI L0 uses SPIR-V.

> (I think they even have https://gpuopen.com/orochi/ which explicitly allows compiling a single binary for both ROCm and CUDA).

Orochi sidesteps this problem... by only supporting NVRTC-style runtime compilation with C++ as input.

And even then, the HIP C++ compiler library is bundled as part of Orochi instead of being part of the app. This means that your app using Orochi will not run on a future GPU gen unless it's updated against a newer Orochi runtime.

We havn't doomed anything... These things happen in cycles. Companies try to force control and compliance and then customers look for alternatives ... The cost had become worth it at point and we have found our point of inflection.