|
|
|
|
|
by Someone
700 days ago
|
|
FTA: “Proof of Average O(n) On average, the pivot will split the list into 2 approximately equal-sized pieces. Therefore, each subsequent recursion operates on 1⁄2 the data of the previous step.” That “therefore” doesn’t follow, so this is more an intuition than a proof. The problem with it is that the medium is more likely to end up in the larger of the two pieces, so you more likely have to recurse on the larger part than on the smaller part. What saves you is that O(n) doesn’t say anything about constants. Also, I would think you can improve things a bit for real world data by, on subsequent iterations, using the average of the set as pivot (You can compute that for both pieces on the fly while doing the splitting. The average may not be in the set of items, but that doesn’t matter for this algorithm). Is that true? |
|
Since these remaining fractions combine multiplicatively, we actually care about the geometric mean of the remaining fraction of the array, which is e^[(integral of ln(x) dx from x=0.5 to x=1) / (1 - 0.5)], or about 73.5%.
Regardless, it forms a geometric series, which should converge to 1/(1-0.735) or about 3.77.
Regarding using the average as the pivot: the question is really what quantile would be equal to the mean for your distribution. Heavily skewed distributions would perform pretty badly. It would perform particularly badly on 0.01*np.arange(1, 100) -- for each partitioning step, the mean would land between the first element and the second element.