Hacker News new | ask | show | jobs
by teo_zero 699 days ago
> On average, the pivot will split the list into 2 approximately equal-sized pieces.

Where does this come from?

Even assuming a perfect random function, this would be true only for distributions that show some symmetry. But if the input is all 10s and one 5, each step will generate quite different-sized pieces!

1 comments

I think you answered your own question. It's the standard average-time analysis of Quicksort and the (unmentioned) assumption is that the numbers are from some uniform distribution.

Why would the distribution have to be symmetric? My intuition is that if you sample n numbers from some distribution (even if it's skewed) and pick a random number among the n numbers, then on average that number would be separate the number into two equal-sized sets. Are you saying that is wrong?

With real numbers, I have the same intuition. But with integers, where 2 or more elements can be exactly the same, and with the two sets defined as they are defined in TFA, that is one "less than" and one "greater or equal", then I'd argue that the second set will be bigger than the former.

In the pathological case where all the elements are the same value, one set will always be empty and the algorithm will not even terminate.

In a less extreme case where nearly all the items are the same except a few ones, then the algorithm will slowly advance, but not with the progression n, n/2, n/4, etc. that is needed to prove it's O(n).

Please note that the "less extreme case" I depicted above is quite common in significant real-world statistics. For example, how many times a site is visited by unique users per day: a long sequence of 1s with some sparse numbers>1. Or how many children/cars/pets per family: many repeated small numbers with a few sparse outliers. Etc.