|
|
|
|
|
by anonymoushn
701 days ago
|
|
No, it's O(N+M) where N is the number of strings and M is the sum of the lengths of the strings. Maybe your radix sort has some problems? I evaluated various sorts for strings as part of my winning submission to https://easyperf.net/blog/2022/05/28/Performance-analysis-an... and found https://github.com/bingmann/parallel-string-sorting to be helpful. For a single core, the fastest implementation among those in parallel-string-sorting was a radix sort, so my submission included a radix sort based on that one. The only other contender was multi-key quicksort, which is notably not a comparison sort (i.e. a general-purpose string comparison function is not used as a subroutine of multi-key quicksort). In either case, you end up operating on something like an array of structs containing a pointer to the string, an integer offset into the string, and a few cached bytes from the string, and in either case I don't really know what problems you expect to have if you're dealing with null-terminated strings. A very similar similar radix sort is included in https://github.com/alichraghi/zort which includes some benchmarks, but I haven't done the work to have it work on strings or arbitrary structs. |
|
That would mean it's possible to sort N random 64-bit integers in O(N+M) which is just O(N) with a constant factor of 9 (if taking the length in bytes) or 65 (if taking the length in bits), so sort billions of random integers in linear time, is that truly right?
EDIT: I think it does make sense, M is length*N, and in scenarios where this matters this length will be in the order of log(N) anyway so it's still NlogN-sh.