| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alxv 3323 days ago
	There is a proof shown in this handout: https://people.eecs.berkeley.edu/~sinclair/cs271/n15.pdf It's hard to understand why this technique works so well without digging deep in the math. Roughly speaking, if you throw n balls in n bins at random, the maximum of number balls in any bins will grow surprisingly quickly (because of the birthday paradox). However, if we allow ourselves to choose between two random bins instead of one, and put the ball in the one with the fewest balls in it, the maximum number of balls in any bins grow much more slowly (i.e., O(ln ln n)). Hence, having that one extra random choice allows us to get surprisingly close to the optimal approach of comparing all bins (which would give us O(1)), without doing all that work.

3 comments

MichaelGG 3322 days ago

Thanks for the explanation! Much clearer and I get the concept. In the case of load balancing, we'd need a ton of servers (1000s?) for this to pay off vs just comparing all, right? Cache updating aside, most of the overhead would be in reading the load numbers in. Comparing a thousand numbers has to be quick in comparison, no?

link

mrkurt 3322 days ago

The problem with load balancing is herd behavior. Stats for load are usually at least a little stale, because it's a distributed system where you can't afford to wait for consistency. When there are traffic spikes a whole herd of new connections will go to the least loaded server for a window of time where the cached "load" number is out of date. Picking two at random helps keep from a bunch of connections racing to one server, even when you're only running 3-4 of them.

link

nickpsecurity 3322 days ago

That's a really intuitive explanation. Appreciate that.

link

euph0ria 3322 days ago

Thank you sir!

link