Hacker News new | ask | show | jobs
by winwang 412 days ago
It's 32 32-bit values which get sorted. I don't think a GPU sort would beat a CPU sort at this scale, even if you don't take kernel launch time into account. CPUs are simply too fast for (super-)small data, especially with AVX-512. But if we're talking about a larger amount of data, that would be a different story, i.e. as part of a normal gpu mergesort.
2 comments

It is also useful if your data already lives on the GPU memory. For example, when you need to z-sort a bunch of particles in a 3d renderer particle system.
A 32 way GPU sorting algorithm might be just what I need for sorting and deduplicating triangle id's in a visibility buffer renderer I am working on.

Thanks for sharing.

As someone who doesn't know very much about graphics (ironically), you're welcome and hope it helps!