Hacker News new | ask | show | jobs
by kwillets 3220 days ago
The bound on string sorting is typically written as O(n log n + D), where D is the sum of distinguishing prefixes, ie input length minus some fluff. Since D >= n log n we already have linearity on input length.
1 comments

Are you saying that you can sort strings in O(n log n + D) using a conventional sorting algorithm such as merge sort? If so, I don't understand why D would be an additive factor implying that each string is only involved in a constant number of comparisons.

(I wasn't, by the way, only considering strings when discussing variable sized keys -- the beauty of discrimination is that we essentially reduce all sorting problems to string sorting.)

A conventional sorting algorithm such as Multikey Quicksort.

http://people.mpi-inf.mpg.de/~sanders/courses/algdat03/salgd...

When people say "conventional" sorting algorithms, they're usually talking about sorting algorithms based around pairwise comparison functions.

I note on slide 14 of this presentation, it looks like this is sort of the discriminator for selecting a better partitioning scheme. So it looks to me like this actually leverages a similar principle?

As we've seen in this thread and others, there are some other ways to measure and/or process that have different characteristics. Surely all of these deserve attention! So, thanks very much for sharing this.

Sorry, I was being imprecise. By conventional I meant comparison-based sorting functions that are polymorphic in the element type and thus not allowed to examine individual bytes.

Multikey Quicksort indeed looks like a special case of discrimination, exploiting some of the same principles.