Hacker News new | ask | show | jobs
by anonymoushn 701 days ago
No, it's O(N+M) where N is the number of strings and M is the sum of the lengths of the strings. Maybe your radix sort has some problems?

I evaluated various sorts for strings as part of my winning submission to https://easyperf.net/blog/2022/05/28/Performance-analysis-an... and found https://github.com/bingmann/parallel-string-sorting to be helpful. For a single core, the fastest implementation among those in parallel-string-sorting was a radix sort, so my submission included a radix sort based on that one.

The only other contender was multi-key quicksort, which is notably not a comparison sort (i.e. a general-purpose string comparison function is not used as a subroutine of multi-key quicksort). In either case, you end up operating on something like an array of structs containing a pointer to the string, an integer offset into the string, and a few cached bytes from the string, and in either case I don't really know what problems you expect to have if you're dealing with null-terminated strings.

A very similar similar radix sort is included in https://github.com/alichraghi/zort which includes some benchmarks, but I haven't done the work to have it work on strings or arbitrary structs.

2 comments

> No, it's O(N+M) where N is the number of strings and M is the sum of the lengths of the strings.

That would mean it's possible to sort N random 64-bit integers in O(N+M) which is just O(N) with a constant factor of 9 (if taking the length in bytes) or 65 (if taking the length in bits), so sort billions of random integers in linear time, is that truly right?

EDIT: I think it does make sense, M is length*N, and in scenarios where this matters this length will be in the order of log(N) anyway so it's still NlogN-sh.

> That would mean it's possible to sort N random 64-bit integers in O(N+M) which is just O(N) with a constant factor of 9 (if taking the length in bytes) or 65 (if taking the length in bits), so sort billions of random integers in linear time, is that truly right?

Yes. You can sort just about anything in linear time.

> EDIT: I think it does make sense, M is length*N, and in scenarios where this matters this length will be in the order of log(N) anyway so it's still NlogN-sh.

I mainly wrote N+M rather than M to be pedantically correct for degenerate inputs that consist of mostly empty strings. Regardless, if you consider the size of the whole input, it's linear in that.

Yes, radix sort can sort integers in linear O(N) time.
If you have a million 1-character strings and one string of length 1 million, how many steps would your LSD radix sort take? And (if it's indeed linear in the total input size like you say) how do you make it jump over the empty slots without losing real-world efficiency in other cases?
It's MSB radix sort. I think LSB radix sort is not generally as useful because even for fixed-size inputs it often makes more passes over most of the input than MSB radix sort.

Your comment makes me think it would be swell to add a fast path for when the input range compares equal for the current byte or bytes though. In general radix sorts have some computation to spare (on commonly used CPUs, more computation may be performed per element during the histogramming pass without spending any additional time). Some cutting-edge radix sorts spend this spare computation to look for sorted subsequences, etc.

Oh I see. I've found LSD better in some cases, but maybe it's just my implementation. For MSD, do you get a performance hit from putting your stack on the heap (to avoid overflow on recursion)? I don't just mean the minor register vs. memory differences in the codegen, but also the cache misses & memory pressure due to potentially long input elements (and thus correspondingly large stack to manage).

> Some cutting-edge radix sorts

Where do you find these? :-)

> For MSD, do you get a performance hit from putting your stack on the heap (to avoid overflow on recursion)?

I'm not sure. My original C++ implementation which is for non-adversarial inputs puts the array containing histograms and indices on the heap but uses the stack for a handful of locals, so it would explode if you passed the wrong thing. The sort it's based on in the parallel-string-sorting repo works the same way. Oops!

> cache misses & memory pressure due to potentially long input elements (and thus correspondingly large stack)

I think this should be okay? The best of these sorts try to visit the actual strings infrequently. You'd achieve worst-case cache performance by passing a pretty large number of strings with a long, equal prefix. Ideally these strings would share the bottom ~16 bits of their address, so that they could evict each other from cache when you access them. See https://danluu.com/3c-conflict/

> Where do you find these? :-)

There's a list of cool sorts here https://github.com/scandum/quadsort?tab=readme-ov-file#varia... and they are often submitted to HN.

Yeah, what I hate about MSD is the stack explosion.

Otherwise - cool, thanks!

Hi, I want to respond to a post from you from 2019. (That 2019 thread no longer offers the reply button, otherwise I would reply there of course.) I apologize for using this thread to get my message in.

This is the item I want to respond to: https://news.ycombinator.com/item?id=19768492 When you took a Classical Mechanics course you were puzzled by the form of the Lagrangian: L = T - V

I have created a resource for the purpose of making application of calculus of variations in mechanics transparent. As part of that the form of the Lagrangian L=T-V is explained.

http://cleonis.nl/physics/phys256/calculus_variations.php

http://cleonis.nl/physics/phys256/energy_position_equation.p...

I recognize the 'you are certainly entitled to ask why' quote, it's from the book 'Classical Mechanics' by John Taylor.

Here's the thing: there is a good answer to the 'why' question. Once you know that answer things become transparent, and any wall is gone.