| If the last presentation's benchmark still goes, blitsort is significantly faster. Keep in mind that glidesort is partly derived from fluxsort and many of its techniques were directly inspired by fluxsort/quadsort, while several others are implemented in a similar or near-identical manner. This is in my opinion not adequately explained in the presentation on youtube. The main difference is that glidesort uses timsort's approach to adaptive sorting rather than quadsort's, though it uses quadsort's novel method for cost-effective branchless merging. Glidesort tries to access 4 memory regions instead of 2, primarily with the bidirectional partition, an approach I implemented in 2021, but rejected because it was causing degraded performance for expensive comparisons. This does however result in a very different visualization which can give the impression it is unrelated. The general approach to stable partitioning is likely nearly identical to fluxsort. As for why blitsort is fast, it's likely due to trinity rotations and monobound binary searches. Possibly also a more optimal overall partitioning strategy. I haven't spend too much time on blitsort however, so there are likely further improvements to be made. |
We've had an extensive email discussion on this, and I still have to publicly provide counterweight.
Yes, I did look at fluxsort/quadsort (mostly the README descriptions) while developing glidesort, I did get inspired by fluxsort's stable quicksort performance. And I did take quadsorts 'parity merge' idea for small sorts, as I described here 9 months ago: https://news.ycombinator.com/item?id=31101498. I'm open about this.
But you're not the first to do branchless merging, I was already aware of that in "Branch Mispredictions Don't Affect Mergesort". Ping-pong merges turns out were already described in "Patience is a Virtue: Revisiting Merge and Sort on Modern Processors".
The observation that the reason the 'parity merge' was so fast being instruction-level and memory-level parallelism is my own (you added this to the README after I publicly described this in the above linked comment). That we can then use it more generally as a bidirectional merge is my own. Using merge splitting and lazy merging to then apply this to more merges is my own. Interleaving multiple independent merge loops is my own. Extending the quicksort to then also use bidirectional partitioning is also my own idea (I don't have a crystal ball about you implementing and rejecting the idea in 2021). My pivot selection is my own. The whole extension of powersort to use lazy merging and more merge splitting for ping-pong merge opportunities is my own. The novel robustly adaptive combination of merge and quicksort is my own. I've spent months and months optimizing glidesort. Glidesort tackles problems fluxsort/quadsort don't even have to worry about in being unable to copy values, only move them, and dealing with panics from user-provided comparator functions, and comes out with minimal losses.
You do great work Igor, but I would appreciate it if you would stop try to claim mine as (mostly) yours.