|
|
|
|
|
by saagarjha
218 days ago
|
|
Performant Triton programs are usually simpler and shorter than their CUDA equivalents. This alone makes it easier to write, and I would argue that it helps with debugging too because the model provides a lot more guarantees on how your code executes. That said, some of the tooling is notably poor (such as cuda-gdb support). |
|
Getting rid of the for loop over an array index doesn't make it easier to understand the hard parts. Losing the developer perf and debug tooling is absolutely not worth the tradeoff.
For me I'd rather deal with Jax or Numba, and if that still wasn't enough, I would jump straight to CUDA.
It's possible I'm an old fogey with bias, though. It's true that I've spent a lot more time with CUDA than with the new DSLs on the block.