|
|
|
|
|
by AndrewPGameDev
1037 days ago
|
|
Interesting timing on posting this to HN, I've recently been optimizing my WebGPU LSD radix sort. Today I measured it against the Thrust CUDA version, and it's about 10x slower (15ms to 1.5ms). My goal was to try to get 10 million elements in 1 ms, but now that I know 3 million in 1.5ms is impossible even for Thrust I know I won't be able to beat that. |
|
AFAIK Thrust is intended to simplify GPU programming. It could well be that for specific use cases, in particular when it is possible to fuse multiple operations into single kernels, you could outperform Thrust.