Hacker News new | ask | show | jobs
by datenwolf 2090 days ago
I'm aware of all of that. And yes, we're very synchronization dependent. However we also spent a lot of time tinkering with the launch parameter and properly interleaving all synchronization events and fences due to our demands on achieving low latency.

Find our original publication here: https://doi.org/10.1364/BOE.5.002963

Since then we improved on that. For the resampling and complex tonemapping we determined empirically that a grid of 128 threads, each processing a whole line achieves the best throughput; there's a 2D parameter space of possible launch configurations and we brute force the whole thing (so far I didn't benchmark the RTX20xx and RTX30xx GPUs, but it was consistent between the GTX690 to GTX1080). The FFT plan is what cufftPlan1d is producing for a single axis transform over a 2D array, usually 2048 point FFT, but with up to 4096 lines (well, technically whatever the maximum dimension for 3D textures is).

> Do you launch a big grid that consists of multiple samples combined in a matrix

Of course!

> or you launch each sample separately?

Of course not, that'd be stupid.

1 comments

Well, most likely I won't be able to help explaining the fluctuation easily then, as you have spent a lot of time on it already. It would be cool to try VkFFT in this usage scenario at some pont in the future though - it also can do 1D FFTs of grouped in matrix sequences.
As I already mentioned over at https://www.reddit.com/r/vulkan/comments/i2ivzh/new_vulkan_f... I'm going to do that. And will let you know how it goes.
If you happen to need any assistance in refining VkFFT for your use case, feel free to contact me.