Hacker News new | ask | show | jobs
by tim-peters 1806 days ago
To get more gonzo, in CPython's list.sort(), the C code that actually merges two lists (which happen, in context, to be two contiguous array slices) is in listobject.c's `merge_at(()` function. That brings in the world of "galloping" (exponential search) optimizations, much faster than one-pair-at-a-time compares in many real-world cases.

So that's a whole lot of additional complication, but it's the heart of what _really_ makes timsort shine in its "jaw dropping" cases.

Tim (of "timsort" - which was an inside joke name at the start, because I never expected anything outside of CPython would use it ;-) ).

4 comments

Wow. Thanks for writing it. I believe this is the part you are referring to: https://github.com/python/cpython/blob/main/Objects/listobje...
Right, `merge_at()`. But that in turn calls small mountains of other C code. It is, alas, a complicated approach. Folding it in would be substantially more work than you've already done.

The point: for inputs like your:

a = list(range(-100, 1700)) b = list(range(1400, 1800))

it would first use a variant of binary search to find where b[0] belongs in a. "Almost all of a" is consumed by that via a relatively tiny number of compares. It would similarly do a binary-ish search to find where a[-1] belongs in b. The tail end of b, from 1700 through 1799, then also never needs to be looked at again.

In the lucky case that two input lists don't overlap at all, and a[-1] <= b[0], that's enough so that a "one-pair-at-a-time" compare _never_ needs to be done. In context (the two lists are contiguous slices of an array), no pointers at all need to be copied in that case either - it deduces that the combined array slices are already in sorted order in total time O(log(len(a)) + log(len(b))), and it's done.

Very Cool!

Edit: Ok, I get it now. Don't just pop, binary search into it, so you can fast forward through. I'm not sure I'll get around to implementing that, at least not in the immediate future, but that makes total sense.

You already got a win - quit while you're ahead ;-) Note that plain binary search here is probably not a good idea. See "listsort.txt", in the same directory as "listobject.c", for excruciating details.
If I get it working, it can be TimMerge.
Haha. AdamMerge works for me - I get enough abuse for naming a sort after myself ;-)

If you pursue this, you can probably throw out piles of the CPython code. That's trying to keep the result "in place", so has major code near-duplication to merge "into the left side" or "into the right side", to minimize the amount of temp memory needed (this depends on which input list is shorter).

But you're writing your output to a new list, so those cases are the same to you.

To keep the sort stable in all cases, though, you still need to distinguish between `gallop_left()` and `gallop_right()`.

Gosh, comments like these from the actual author are a reminder of why I love HN. :)
I'm also amazed by the fact that that John Nagle from the popular Nagle's algorithm is still active on HN!
There are lots of known people on HN. I started tagging HN usernames with a browser extension some time ago and sometimes I spot some really interesting exchanges given the context of what the posters worked on.
I'd like to more about this. Can I use this?
Search for HN username tagging extension. There's quite a few options available.
Is this collaborative? That is, can one benefit from other people's tagging?
Is there a reason to use galloping over an array vs a seek in a search tree?

Better real world factors?

Galloping, Demaine set intersection, worst case optimal joins, they all seem to be different aspects of the same underlying principle.

So I'm very curious about the peculiarities in the sorting case.

See CPython's https://github.com/python/cpython/blob/main/Objects/listsort... for details about the sort. In fact, its "galloping" was inspired by a paper of which Demaine was a co-author (reference in the file already linked to).

CPython's lists are implemented as contiguous C arrays of pointers (to Python objects). Any way of grafting a tree-ish structure on top of that would be unacceptably wasteful of space and/or time. So sorting "gallops" over arrays because it's array-in, array-out.

Thank you for the explanation!
I had the same idea, that re-using a lot of the non-sort parts of the TimSort code in this merge, the result would be just "Tim".