Increasingly (for instance ADSP podcast [1]) those in nvidia's inner circle are advocating against writing your own CUDA kernels. (Unless that's your full time job at nvidia, that is).
That would be cool but nvidia released blackwell and still have not released unbroken kernels for sm120. Sm120 is not the data center gpu, so it doesn't get its love. So we can't depend on nvidia to do the right thing is my point unfortunately
It’s not about whether you work at Nvidia. Avoid writing CUDA kernels if there are higher level libraries that do what you need. Do write CUDA kernels if you want to learn how, or if you need the low level control, or to micro-optimize. Being able to fuse kernels to avoid memory traffic or get better specialization is also a reason to reach for raw CUDA. Just consider what’s the right tool for the job…
Writing efficient CUDA code is very, very difficult; most CUDA code is not actually good at utilizing the hardware. It is much easier to write performant code in higher level languages (and most people are doing exactly this).
That all depends on what you’re doing. Like I said, if a high level lang or lib supports and fits your goal well, then yes you should use it. I don’t know what most people are doing, but it’s fair to say that a lot of people can use a higher level language.
If you’re trying to learn CUDA, then using a higher level language is not the best approach. If you already used a high level language and found that your performance is lacking and could be better if you could fuse some of your kernels, and avoid some of the memory round-trips, then moving to something lower level is called for.
I’m suggesting it’s better to think about your goals for one minute and understand the basic choices than it is to assume there’s something that works for everyone’s goals, and higher level languages don’t meet everyone’s goals.
That advice seems like nonsense. It's like saying avoid C because you can use Python, or avoid writing a graphics engine because you can license Unreal.
can very much agree about not writing stuff like reductions yourself, unless you have good reason to.
but this sort of feels like another "implement everything with <nvidia stuff> and you'll have a great time!! (but also coincidentally get locked in even more to Nvidia hardware)"