| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by adamgordonbell 1806 days ago
	Wow. Thanks for writing it. I believe this is the part you are referring to: https://github.com/python/cpython/blob/main/Objects/listobje...

1 comments

tim-peters 1806 days ago

Right, `merge_at()`. But that in turn calls small mountains of other C code. It is, alas, a complicated approach. Folding it in would be substantially more work than you've already done.

The point: for inputs like your:

a = list(range(-100, 1700)) b = list(range(1400, 1800))

it would first use a variant of binary search to find where b[0] belongs in a. "Almost all of a" is consumed by that via a relatively tiny number of compares. It would similarly do a binary-ish search to find where a[-1] belongs in b. The tail end of b, from 1700 through 1799, then also never needs to be looked at again.

In the lucky case that two input lists don't overlap at all, and a[-1] <= b[0], that's enough so that a "one-pair-at-a-time" compare _never_ needs to be done. In context (the two lists are contiguous slices of an array), no pointers at all need to be copied in that case either - it deduces that the combined array slices are already in sorted order in total time O(log(len(a)) + log(len(b))), and it's done.

link

adamgordonbell 1806 days ago

Very Cool!

Edit: Ok, I get it now. Don't just pop, binary search into it, so you can fast forward through. I'm not sure I'll get around to implementing that, at least not in the immediate future, but that makes total sense.

link

tim-peters 1806 days ago

You already got a win - quit while you're ahead ;-) Note that plain binary search here is probably not a good idea. See "listsort.txt", in the same directory as "listobject.c", for excruciating details.

link

adamgordonbell 1806 days ago

If I get it working, it can be TimMerge.

link

tim-peters 1806 days ago

Haha. AdamMerge works for me - I get enough abuse for naming a sort after myself ;-)

If you pursue this, you can probably throw out piles of the CPython code. That's trying to keep the result "in place", so has major code near-duplication to merge "into the left side" or "into the right side", to minimize the amount of temp memory needed (this depends on which input list is shorter).

But you're writing your output to a new list, so those cases are the same to you.

To keep the sort stable in all cases, though, you still need to distinguish between `gallop_left()` and `gallop_right()`.

link