|
|
|
|
|
by winwang
412 days ago
|
|
It's 32 32-bit values which get sorted. I don't think a GPU sort would beat a CPU sort at this scale, even if you don't take kernel launch time into account. CPUs are simply too fast for (super-)small data, especially with AVX-512.
But if we're talking about a larger amount of data, that would be a different story, i.e. as part of a normal gpu mergesort. |
|