|
|
|
|
|
by exDM69
699 days ago
|
|
Problem with radix sorting strings is that it is O(k*N) where k is length of key, in this case it's the second longest string's length. Additional problems arise if you are dealing with null terminated strings and do not have the length stored. Radix sort is awesome if k is small, N is huge and/or you are using a GPU. On a CPU, comparison based sorting is faster in most cases. |
|
I evaluated various sorts for strings as part of my winning submission to https://easyperf.net/blog/2022/05/28/Performance-analysis-an... and found https://github.com/bingmann/parallel-string-sorting to be helpful. For a single core, the fastest implementation among those in parallel-string-sorting was a radix sort, so my submission included a radix sort based on that one.
The only other contender was multi-key quicksort, which is notably not a comparison sort (i.e. a general-purpose string comparison function is not used as a subroutine of multi-key quicksort). In either case, you end up operating on something like an array of structs containing a pointer to the string, an integer offset into the string, and a few cached bytes from the string, and in either case I don't really know what problems you expect to have if you're dealing with null-terminated strings.
A very similar similar radix sort is included in https://github.com/alichraghi/zort which includes some benchmarks, but I haven't done the work to have it work on strings or arbitrary structs.