Hacker News new | ask | show | jobs
by bogomipz 3391 days ago
Thanks, right, the idea to is to only recurse down one side on each pass right? This is the (sub)family of "selection" algorithms?

This is interesting I had not heard of introselect before, I will have to read up on this.

1 comments

Yup, exactly. It's a fun area that gets even more interesting when you look at distributed systems, where every node only stores a small portion of the overall data and you want to keep communication volume low.
>"It's a fun area that gets even more interesting when you look at distributed systems"

Can you elaborate a little bit, specifically how selection algorithms help reduce "chatter" in distributed systems? I am not familiar with this and this sounds like an interesting context.

I have heard of Merkle Trees for similar but that's obviously hashing and not selection algorithms, or is there some connection I am not making?

Well if you'd like to select the k-th biggest element from a distributed array it's impractical - if not impossible - to send everything to one machine so another approach is needed. A fairly simple algorithm would select a pivot randomly on a random node and broadcast it to all other nodes. Each node then does the partitioning locally and counts partition sizes. Now you do a sum all-reduction on the sizes to decide which side you need to recurse on. This takes logarithmically many rounds in expectation just like the sequential algorithm, but it could result in very unbalanced work if the data isn't distributed randomly. It also has even more trouble if the pivot selection is bad because of the communication overhead. There are various techniques to overcome these challenges but it's a bit much to go into here.