Hacker News new | ask | show | jobs
by pjmlp 440 days ago
And is stuck with C99, versus C++20, Fortran, Julia, Haskell, C#, anything else someone feels like targeting PTX with.
1 comments

Technically, OpenCL can also include inline PTX assembly in kernels (unlike any compute shader API I've ever seen), which is relevant for targeting things like tensor cores. You're absolutely right about the language limitation, though.
At which point why bother, PTX is CUDA.
Generally, the reason to bother with this approach is if you have a project that only needs tensor cores in a tiny part of the code and otherwise benefits from the cross platform nature of OpenCL, so you have a mostly shared codebase with a small vendor-specific optimization in a kernel or two. I've been in that situation and do find that approach valuable, but I'll be the first to admit the modern GPGPU landscape is full of unpleasant compromises whichever way you look.