|
|
|
|
|
by williamkuszmaul
1478 days ago
|
|
Impressive! There are already implementations of sample sorting that are much faster than c++ sort (but I don't recall how much faster). I'd be very interested in a comparison to some of those... Also, since we are sorting integers, I'd also be interested to know how well a modern implementation of radix sort can be made. |
|
We integrate vqsort into ips4o, calling it once subarrays fit in L2. That speeds up ips4o by 2.9x (single thread) and 1.6 (parallel - bandwidth limited). More details in the paper.
I previously also worked on radix sort: https://arxiv.org/abs/1008.2849 Interestingly, that 12 year old result on dual-socket/8 cores is only about twice as fast as a single AVX-512 core today.