| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ipsofacto 2266 days ago

There's an even better branch-free (super scalar) sorting algorithm: "In-place Parallel Super Scalar Samplesort (IPS4o)" which we started using:

https://github.com/SaschaWitt/ips4o

https://arxiv.org/abs/1705.02257

As an example, to sort 10 million random longs on my computer it takes std::sort 766 ms (roughly in line with Andrei's numbers) and ips4o::sort takes 274 ms.

[edit:formatting]

2 comments

phkahler 2266 days ago

2x-3x improvement doesn't seem like much for something that runs in parallel.

link

ipsofacto 2266 days ago

The 274 ms is for the sequential version of the algorithm (on my i7-9750 laptop).

On my office Xeon E5-2690 machine when using multiple threads the runtime decreases like this

840 ms for std::sort

372 ms for IPS4o sequentially

201 ms for 2 threads

104 ms for 4 threads

53 ms for 8 threads

33 ms for 16 threads

link

phkahler 2266 days ago

So starting off better on a single, and then scaling well with threads. Nice!

link

bluescarni 2265 days ago

That's really interesting!

How does it behave with almost-sorted data?

link

ipsofacto 2265 days ago

There's some logic to detect if the data is already sorted or reverse sorted, but I think that's only triggered at the very beginning and not at every recursion level. If the data is almost sorted it seems to become a little bit faster. Take a look at the appendix of the paper where there are timings for various distributions.

There's also some logic to handle the case where there are a lot of equal elements which also results in faster performance than the completely random case.

link

goldenkey 2266 days ago

You might want to mention if are affiliated with the algorithm you are promoting. ipsofacto ~ ips4o

link

ComputerGuru 2266 days ago

Or maybe the throwaway was named after the comment it was intended to be used to post.

link

ipsofacto 2266 days ago

No affiliation, just picked the name because it fit... The company I work at was exploring various sort algorithms to use and this turned out to be the fastest.

Another benefit of this algorithm is that it is parallizes extremely well.

link