|
|
|
|
|
by alextp
5101 days ago
|
|
Mapreduce as a concept goes beyond lisp implementations. On the surface it might seem like the point of mapreduce is expressing computations in terms of map and reduce functions. It isn't. The point of mapreduce is reducing the problem of high-throughput fault-tolerant distributed systems to a very efficient and reliable distributed sorting algorithm (the shuffle phase, which is implemented by the implementations of mapreduce and not by the user code). If you can express all synchronization in your algorithm in terms of sorting, then whatever you do before sorting (map) or after it (reduce) is kind of trivial, as the hard part is taken care of by the framework. This abstraction is novel, and profoundly useful, and that's the point of mapreduce, not so much the actual map() and reduce() functions. |
|
I would love to see how programmers with large clusters at their disposal were approaching large datasets before the moment they realized splitting the task into smaller pieces was what they should do.