|
|
|
|
|
by DTolm
2092 days ago
|
|
GPU is a very consistent device, so the purpose of such big sample sizes and multiple launches with averaging is to reduce all the deviations almost to zero. The error is <1% in this case and showing it on the plot will not really change it. The values, however, change when I update the code and improve it, so this is by no means the final way the benchmark will look like. I will think on how to adress this better in the future, but for now I think the best solution if you doubt the results is to launch VkFFT and see what it outputs for yourself. |
|
You'd think that, but I found all GPUs I'm using here to exhibit multimodal distribution of execution times in the FFT (this is for the cuFFT codepath). The GTX980 (not shown in the plot) and the Titan-X even have very prominent outliers. This is a figure that's going to be in the paper I'm currently writing:
https://dl.datenwolf.net/gpu_oct_benchmark_plots.pdf
I'm comparing the OCT processing execution times (with HOT caches, mind you) between a Titan-X and a GTX1080. The difference also shows up very prominently when looking at the kernel scheduling order as reported by NVPP.