| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mlochbaum 488 days ago

So the idea is that you'll eventually arrange n elements into ideally about √n sorted lists, organized so that the endpoints of these lists are also ordered (the "rainbow" shape). A new element goes on the end of one of those lists or a new list; which one is generally decided by binary search, but the index is saved to speed up the next insertion if it's the same. So a natural run might benefit by having adjacenct values inserted to the same list.

Which cases benefit and which don't seems potentially quite erratic. If you have a series of runs that shrink over the course of the array, with increasing start value and decreasing start value, each will become a new sorted list and you end up with O(n) insertion plus a perfect merge. If the runs get wider instead, they'll cross over existing endpoints and end up spattered around before merging.

It feels like the paper keeps trying to spin constant-factor improvements as somehow being asymptotically relevant despite acknowledging that mathematically they aren't, for example the fact that log(√n) is only log(n)/2 "would not reflect the sublinear nature of base array length k" and runs hitting the best case "approximately 50% of the time... reinforces JesseSort's average runtime complexity as somewhere between O(n) and O(n log_2 n)". Powersort provably takes O(n + n log r) worst-case time where r is the number of runs; I expect this algorithm is O(n log n) worst- and average-case.

However, on random arrays the given benchmarks do improve relative to Python default as the length increases, with faster performance starting a little under a million elements. There isn't a lot of investigation into why this is. I'd be interested to know if the number of comparisons is reduced in particular. Perhaps this algorithm is more cache-friendly with a random input, but this wouldn't be reliable as many inputs can cause the number of sorted lists to scale with O(n) so that the binary searches need to work on a large array.

1 comments

mlochbaum 488 days ago

The analysis would make a lot more sense if it dropped every big O and computed an expected number of comparisons. Of course, the real issue is that the whole thing relies on an empirically-determined length of √8√n for the number of lists in the rainbow, when the theoretical justification isn't there and clearly the worst case is n/2. I'm not seeing any awareness that the expected number of comparisons has to be at least log_2(n!), and am starting to see some worrying signs that the author thinks averages like n log log n are possible. I'd initially assumed that "somewhere between O(n) and O(n log_2 n)" statement meant in the presence of asymptotically large natural runs, but there's a claim that the runs "need not be of any minimum size" and another that the algorithm can "capitalize on natural runs" that seems to be referring to the benchmarks on "purely random values".

link

mlochbaum 487 days ago

Ah, the README prior to commit b98f724 does claim O(n log(1.5 ln(n))) asymptotic runtime. This is based on a flawed analysis assuming rainbow endpoints will be evenly distributed, which the paper presents before describing the problem that each inserted element pushes one of those endpoints away from the middle. Bit of a perpetual motion machine search: initial results showed the promise of this approach, some details got in the way but surely there's some trick to work around them...

Still there's the benchmark. I remembered timsort.txt shows comparison counts relative to the log_2(n!) bound and it's only like 1% worse, so the >30% improvement over Timsort at the high end has to come from some other factor. I'm thinking it is actually cache effects—not because the algorithm has inherently better cache properties, as it's just a merge sort at large scales, but because it copies the array to ints internally rather than using a generic comparison. That would be a poor result, as good comparison sorts compiled for 4-byte data are many times faster than Timsort.

link

tim-peters 486 days ago

I want to thank you for your analysis! I'm the "timsort" guy, and I'm asked to look at all sorts of things. The devil is in the details, and I've given up bothering to look unless a paper spells out sufficient details up front. Else it's a pit I'd rather not get sucked into.

As general things, timsort was aimed at exploiting all kinds of pre-existing order, not random lists. The only goal for the latter was to be within reach of CPython's previous highly tuned samplesort implementation (like quicksort on steroids, with much less data movement than mergesorts, but more comparisons).

As you say, for randomly ordered input it gets remarkably close to the information-theoretic lower bound on # of compares. So that's not it.

"Cache effects", maybe, but that's a pit to dig into. I'll note that while the newer "powersort" merge strategy is elegant and provably near-optimal by some relevant measures, in practice it doesn't really appear to run any faster (although cases can be _contrived_ that make it - or the older method! - run faster).

Comparison specialization (to ints) could very well account for it. CPython's general comparison machinery is _very_ expensive, even for what turn out to be native machine ints (which CPython cannot know at compile-time: everything is deduced at runtime).

CPython has since grown new machinery to do a pre-pass over the list, and do a form of cheaper comparison specialization for what turn out to be homogeneous lists of suitable types (including "small enough" ints). That alone gives enough speedup to make some of the original choices sub-optimal. For example, binary insertion sort for short runs is no longer best then. That was aimed at minimizing worst-case # of compares, but as comparisons get cheaper that has less value.

There are also surprises in everything. For example, the worst case for binary insertion sort on most machines was _not_ reverse-ordered input, despite that it requires the most data movement. Instead the worst case was randomly ordered data. Why? Branch prediction. In randomly ordered data, each branch is unpredictable. In reverse-ordered data, "move to the left" is always the test outcome.

Another generality is that Python's sort cares more about speed for shorter lists than huge ones. "Big data" problems are more likely to use, e.g., extensions geared to "big data problems", like numpy for giant arrays of floats. In those contexts more suitable _algorithms_ exist, like radix sorts, or multi-key quicksorts for lists of long strings with long shared prefixes.

Python's niche is more in, e.g., web services, where a great many sorts of shorter lists are common.

Bottom line: there is no one "best" sorting algorithm. I keep an eye out for newer developments, but haven't seen anything better yet for what Python is aiming at. I was, e.g., very impressed by pdqsort - but it's not a stable sort, and that alone makes it a non-starter for general Python use.

link

mlochbaum 486 days ago

Whoa, hey Tim! Very much agree on no "best" sorting algorithm. The generic-comparison case is one I keep idly considering, although more focused on long lists than short ones. There's been a lot of improvement lately in sorting small datatypes (arguably kicked off by fluxsort), but really, who has so many 4-byte ints laying around that sorting them is a bottleneck? A point Orson Peters made when presenting glidesort[0] was that lists with many repeated duplicates are common even in generic data, as in ordering customers by city, and quicksort seems to be the best way to deal with such a list. However glidesort/driftsort is still largely oriented towards smaller types, and its mergesort-quicksort hybridization can pretty easily make bad choices, as I described at [1] and especially [2] (slightly surprised I didn't mention binary insertion sort in that first section). So this approach feels like it's still immature to me.

[0] https://archive.fosdem.org/2023/schedule/event/rust_glidesor...

[1] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

[2] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

link

tim-peters 486 days ago

You're much more up to date on the details of recent developments than I am. It's faded into a background interest (although a persistent one) for me.

One thing that wasn't clear to me about JesseSort: in what way(z) are Rainbows believed to be an improvement over the simpler scheme used by the older "patience sorting"? They both maintain a collection of sorted sublists merged at the end, both use binary search to find the right sublist to add "the next" array element to, and both excel at "low diversity" inputs (if there are K distinct values, patience sorting will build at most K sublists).

As I recall, full-blown patience sorting isn't "naturally stable". Searching through the JesseSort paper, I didn't find it addressed, and the details were too fuzzy to guess offhand. While this is in no way a principled objection, it just _seemed_ "too complicated" to me. Then again, my intuition for such things has served me well before ;-)

link

mlochbaum 486 days ago

I'd forgotten about patience sorting! Not that I was ever exactly familiar with it. JesseSort seems nearly identical, except that it allows insertion to either side of the lists, tries the previous index before doing a binary search, and uses pairwise instead of k-way merge at the end. The expected Θ(√n) list count is the same as well. An obvious advantage of going double-ended is that you get automatic O(n) sorting if the input's an ascending or descending run, instead of just one of those.

I think both algorithms can be made stable; if there's a barrier for patience sort it would be that the merge phase pops off the end of the sorted lists so it's going in the opposite direction as insertion? For JesseSort, an element that's equal to a previous one can only be inserted to the same list, or a more central one (created later). So you need to merge sublists giving priority to the outer ones, and also never append an element to the bottom of a sublist if it's equal to that list's current minimum. Doesn't look like JesseSort as implemented in jessesort_c.pyx does either of these: it merges non-adjacent lists and has a backwards merge by virtue of using < instead of <=, and always inserts to the innermost list possible instead of being strict about the bottom side. Which is understandable as being strict has a major downside. If you get a string of equal values in the bottom half of the range, you'd be stuck putting each one in a new sublist until you get to the midpoint and can create a new one where these values start going to the top.

link

zahlman 487 days ago

>This is based on a flawed analysis assuming rainbow endpoints will be evenly distributed

I guess you know this, but we don't need to do any complicated analysis to disprove such a big-O for a comparison-based sort. It's a well known information-theory result: each comparison can remove at most O(1) entropy, while the initial state has O(n lg n) entropy because that's how log(N!) grows (https://en.wikipedia.org/wiki/Stirling%27s_approximation).

link