Hacker News new | ask | show | jobs
by phkahler 2217 days ago
2x-3x improvement doesn't seem like much for something that runs in parallel.
1 comments

The 274 ms is for the sequential version of the algorithm (on my i7-9750 laptop).

On my office Xeon E5-2690 machine when using multiple threads the runtime decreases like this

840 ms for std::sort

372 ms for IPS4o sequentially

201 ms for 2 threads

104 ms for 4 threads

53 ms for 8 threads

33 ms for 16 threads

So starting off better on a single, and then scaling well with threads. Nice!
That's really interesting!

How does it behave with almost-sorted data?

There's some logic to detect if the data is already sorted or reverse sorted, but I think that's only triggered at the very beginning and not at every recursion level. If the data is almost sorted it seems to become a little bit faster. Take a look at the appendix of the paper where there are timings for various distributions.

There's also some logic to handle the case where there are a lot of equal elements which also results in faster performance than the completely random case.