| > Keep in mind that glidesort is partly derived from fluxsort and many of its techniques were directly inspired by fluxsort/quadsort, while several others are implemented in a similar or near-identical manner. This is in my opinion not adequately explained in the presentation on youtube. We've had an extensive email discussion on this, and I still have to publicly provide counterweight. Yes, I did look at fluxsort/quadsort (mostly the README descriptions) while developing glidesort, I did get inspired by fluxsort's stable quicksort performance. And I did take quadsorts 'parity merge' idea for small sorts, as I described here 9 months ago: https://news.ycombinator.com/item?id=31101498. I'm open about this. But you're not the first to do branchless merging, I was already aware of that in "Branch Mispredictions Don't Affect Mergesort". Ping-pong merges turns out were already described in "Patience is a Virtue: Revisiting Merge and Sort on Modern Processors". The observation that the reason the 'parity merge' was so fast being instruction-level and memory-level parallelism is my own (you added this to the README after I publicly described this in the above linked comment). That we can then use it more generally as a bidirectional merge is my own. Using merge splitting and lazy merging to then apply this to more merges is my own. Interleaving multiple independent merge loops is my own. Extending the quicksort to then also use bidirectional partitioning is also my own idea (I don't have a crystal ball about you implementing and rejecting the idea in 2021). My pivot selection is my own. The whole extension of powersort to use lazy merging and more merge splitting for ping-pong merge opportunities is my own. The novel robustly adaptive combination of merge and quicksort is my own. I've spent months and months optimizing glidesort. Glidesort tackles problems fluxsort/quadsort don't even have to worry about in being unable to copy values, only move them, and dealing with panics from user-provided comparator functions, and comes out with minimal losses. You do great work Igor, but I would appreciate it if you would stop try to claim mine as (mostly) yours. |
"Since the parity merge can be unrolled it's very suitable for branchless optimizations to speed up the sorting of random data. Another advantage is that two separate memory regions can be accessed in the same loop with no additional overhead. This makes the routine up to 2.5 times faster on random data."
I described what made it fast and it was obvious there was general applicability. If I understanding your reasoning correctly, because you took inspiration from my more advanced unguarded two directional merge, you feel like a guarded two directional merge is some new novel discovery.
This is a very contentious line of reasoning.
Subsequently you then proceed to give me zero credit for branchless merging in your Youtube presentation, instead referencing a branchless merge that gave a 10-20% speed-up instead of quadsort's 150% speed-up. You also claimed in the presentation there were no bidirectional branchless algorithms written to date, this is simply not true.
This does upset me. It would be a simple matter, and the right thing, to point out that quadsort introduced bidirectional branchless merging.
As for your work to increase the memory regions from 2 to 4. This is easily conceived, but indeed quite challenging to actually implement.