Hacker News new | ask | show | jobs
by suuuuuuuu 62 days ago
CUDA is not C++. CUDA for GPU kernels is its own language. That's the actual problem requiring new languages or abstractions.
1 comments

Says those that don't know CUDA.

You can program CUDA in standard C++20, with CUDA libraries hidding the language extensions.

I love when C and C++ dialects are C and C++ when it matters, and not when it doesn't help to sell the ideas being portrayed.

Sorry, I wasn't aware of these developments (having abandoned CUDA for hardware-agnostic solutions before 2020). It doesn't change my point anyway, if it's specific to a single vendor.

I'm extremely dubious that such an opaque abstraction can actually solve the (true) problem. "Not having to write CUDA" is not enough - how do you tune performance? Parallelization strategies, memory prefetching and arrangement in on-chip caches, when to fuse kernels vs. not... I don't doubt the compiler can do these things, but I do doubt that it can know at compile time what variants of kernel transformations will optimize performance on any given hardware. That's the real problem: achieving an abstraction that still gives one enough control to achieve peak performance.

Edit: you tell me if I'm wrong, but it seems that std::par can't even use shared memory, let alone let one control its usage? If so, then my point stands: C++ is not remotely relevant. Again, avoiding writing CUDA (etc.) doesn't solve the real problem that high-performance language abstractions aim to address.

So what would be such an HPC language that you're so fond of? A quick web search reveals only languages that use C++/CUDA code as a back end (python), are new and experimental (Julia) or FORTRAN. For what you're talking about none seem all to good, so you've peaked my curiosity.
See https://arxiv.org/abs/2512.17101. I've used some of the tools in the stack they describe (and see Sec 2 for an overview of others). JAX/XLA/etc. are somewhat similar, though still without user control over transformations.

Perhaps part of the reason for the bad takes in this thread is due to taking "language" overly literally (perhaps also the fault of the linked blog post itself). I think one thesis of the above tooling is that, when tuning and generating code (CUDA, OpenCL, what have you) at runtime, the best "languages" for these abstractions are, amusingly, scripting languages like Python. Having CUDA/etc. as a back end without having to hand-write/-transform/-optimize it is indeed the point.

If CUDA is C++ then I'd like to know how you throw and catch exceptions in CUDA kernels.
The same way as people writing C++ code as Google employees do, including LLVM and Chrome.