Hacker News new | ask | show | jobs
by alecco 43 days ago
> directly to PTX

Weird. There's a recent NVIDIA MLIR that is quite good and fast. Or they could target the even easier and more recent/fashionable tile IR [1] used by CuTile [2] (a little bit higher level but significantly easier to target, only loses on epilogue fusion and similar).

[1] https://docs.nvidia.com/cuda/tile-ir/

[2] https://developer.nvidia.com/cuda/tile