Hacker News new | ask | show | jobs
by devit 2040 days ago
> VkFFT is able to match and outperform cuFFT on the whole tested range from 2^7 to 2^28 in single precision

What is your explanation for this?

Is the VkFFT algorithm better? Is SPIR-V fundamentally more expressive than PTX? Are nVidia drivers better at compiling SPIR-V than PTX?

Have you compared the generated GPU assembly from both?

1 comments

FFT is an extremely bandwidth limited problem, so if most time is taken by one upload by both algorithms, the overall time will be similar. More in-depth analysis of how VkFFT and cuFFT scales with memory clocks and bandwidth can be found here: https://www.reddit.com/r/nvidia/comments/jxlbjs/rtx_3090_ove...

I don't know exactly what cuFFT does differently, but I am fairly certain they use very similar memory layout and algorithms behind their code (judging by execution times only).

What should be the main take from this is that Vulkan allows for similar in performance low-level memory control, while being cross platform and open source. I don't think that SPIR-V is more expressive - bet Nvidia wouldn't allow this. But it doesn't prohibit it from still being good.