Hacker News new | ask | show | jobs
by dekhn 2863 days ago
MapReduce is filled with algorithms. The core system is basically a merge sort engine (the shuffler). In fact, if you ever wondered about the question "sort 2M integers in 1M RAM", well, that was Jeff asking people if they could understand that shuffle was out-of-core.

Another example is lexicographic range sharding, which uses reservoir sampling to compute optimal tablet key split points by doing a constant-space-and-time heuristic sampling over the keyspace.

I used to think MR was just brute force, but it has many levels of algorithms. Probably too many- at some point it because hard to analyze how the system worked because of the various kinds of hedging and recovery strategies.