Hacker News new | ask | show | jobs
by fa_il 5101 days ago
Sorry, but applying an old concept to a new problem (actually just new buzzwords... it's only the size of the problem that's new) does not make a "novel" solution. Moreover, it's an obvious solution. But I guess that depends on who is doing the programming.

I would love to see how programmers with large clusters at their disposal were approaching large datasets before the moment they realized splitting the task into smaller pieces was what they should do.

1 comments

It's not about splitting the task into smaller pieces. It's about factoring out the parts of the task that need synchronization among all machines into one specific subroutine (groupBy) which makes mapreduce so powerful.

If you speak with people experienced in multithreaded and distributed programming you will see that synchronization with fault-tolerance is _hard_, and mapreduce provides a widely-applicable set of sufficient conditions for an algorithm to be executable with implicit fault-tolerance and implicit synchronization.

Without mapreduce-like abstractions eveyr piece of software has to be responsible for its own (1) checkpointing (to recover from errors), (2) checksumming (to ensure that no errors happened), and (3) distributed communication (to make sure the global state becomes global and the local state becomes local).