|
|
|
|
|
by fa_il
5101 days ago
|
|
Sorry, but applying an old concept to a new problem (actually just new buzzwords... it's only the size of the problem that's new) does not make a "novel" solution. Moreover, it's an obvious solution. But I guess that depends on who is doing the programming. I would love to see how programmers with large clusters at their disposal were approaching large datasets before the moment they realized splitting the task into smaller pieces was what they should do. |
|
If you speak with people experienced in multithreaded and distributed programming you will see that synchronization with fault-tolerance is _hard_, and mapreduce provides a widely-applicable set of sufficient conditions for an algorithm to be executable with implicit fault-tolerance and implicit synchronization.
Without mapreduce-like abstractions eveyr piece of software has to be responsible for its own (1) checkpointing (to recover from errors), (2) checksumming (to ensure that no errors happened), and (3) distributed communication (to make sure the global state becomes global and the local state becomes local).