| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by asafira 3392 days ago
	It is only linear in std::distance(first, last), so it very well might be (and likely is) nlogk . Edit: see comment below for why it actually is O(n) and not O(n logk) as I had thought.

1 comments

lorenzhs 3392 days ago

But std::distance(first, last) is how n is defined.

std::nth_element is typically implemented using a fancy version of quickselect called introselect. You can imagine quickselect as a version of quicksort where the recursive call is only executed on the side which contains the element. Introselect adds some fanciness to ensure worst-case linear running time if quickselect takes too long (bad pivot selection).

link

asafira 3392 days ago

very cool! I stand corrected, I didn't think of this when I considered it. Sorry for the mix-up

link

bogomipz 3391 days ago

Its O(n logk) then? It maintain a heap I'm guessing?

link

lorenzhs 3391 days ago

No, as per my previous comment, it t does not use a heap. It's like quicksort, but only doing one of the recursive calls:

- choose a pivot p randomly from the elements

- partition into elements <= p and > p. This takes time O(n).

- if the number of elements <= p is at most k (the rank of the element we want), do a recursive call on these elements. Otherwise recurse on the other elements (those >p).

Because a random pivot splits the elements well enough most of the time, this has an expected running time of O(n): 1 + ½ + ¼ + ⅛ + … < 2 in the best case, and similar constants if the partitioning is a little worse. But in the worst case, when the pivot is the smallest/largest element in each iteration, it's O(n²). That's where the fanciness of introselect comes in: if quickselect doesn't converge, it switches to an algorithm with worst-case O(n) time, which is slower in most cases but saves the day when quickselect has trouble.

link

username223 3391 days ago

See "median of medians algorithm": https://en.wikipedia.org/wiki/Median_of_medians

It's hairier than a partial quicksort, but (if you erase the proof and weigh the chalk dust) guaranteed linear time.

link

gpderetta 3391 days ago

and median of medians can be used to implement quicksort with O(NlogN) worst case. It is usually not worth it as other hybrid quicksorts like introsort are faster in practice.

On an unrelated note, I was once asked to prove the upper bound for median of median on an interview...

link

bogomipz 3391 days ago

Thanks, right, the idea to is to only recurse down one side on each pass right? This is the (sub)family of "selection" algorithms?

This is interesting I had not heard of introselect before, I will have to read up on this.

link

lorenzhs 3391 days ago

Yup, exactly. It's a fun area that gets even more interesting when you look at distributed systems, where every node only stores a small portion of the overall data and you want to keep communication volume low.

link

bogomipz 3391 days ago

>"It's a fun area that gets even more interesting when you look at distributed systems"

Can you elaborate a little bit, specifically how selection algorithms help reduce "chatter" in distributed systems? I am not familiar with this and this sounds like an interesting context.

I have heard of Merkle Trees for similar but that's obviously hashing and not selection algorithms, or is there some connection I am not making?

link