Hacker News new | ask | show | jobs
by ipsofacto 2217 days ago
There's an even better branch-free (super scalar) sorting algorithm: "In-place Parallel Super Scalar Samplesort (IPS4o)" which we started using:

https://github.com/SaschaWitt/ips4o

https://arxiv.org/abs/1705.02257

As an example, to sort 10 million random longs on my computer it takes std::sort 766 ms (roughly in line with Andrei's numbers) and ips4o::sort takes 274 ms.

[edit:formatting]

2 comments

2x-3x improvement doesn't seem like much for something that runs in parallel.
The 274 ms is for the sequential version of the algorithm (on my i7-9750 laptop).

On my office Xeon E5-2690 machine when using multiple threads the runtime decreases like this

840 ms for std::sort

372 ms for IPS4o sequentially

201 ms for 2 threads

104 ms for 4 threads

53 ms for 8 threads

33 ms for 16 threads

So starting off better on a single, and then scaling well with threads. Nice!
That's really interesting!

How does it behave with almost-sorted data?

There's some logic to detect if the data is already sorted or reverse sorted, but I think that's only triggered at the very beginning and not at every recursion level. If the data is almost sorted it seems to become a little bit faster. Take a look at the appendix of the paper where there are timings for various distributions.

There's also some logic to handle the case where there are a lot of equal elements which also results in faster performance than the completely random case.

You might want to mention if are affiliated with the algorithm you are promoting. ipsofacto ~ ips4o
Or maybe the throwaway was named after the comment it was intended to be used to post.
No affiliation, just picked the name because it fit... The company I work at was exploring various sort algorithms to use and this turned out to be the fastest.

Another benefit of this algorithm is that it is parallizes extremely well.