Hacker News new | ask | show | jobs
by fa_il 5102 days ago
The thing is that "MapReduce" is a concept that was in practice, e.g. by LISP programmers, long, long before Google rediscovered it.

Much like Google's many acquisitions that the public perceives as resulting from "Google R&D", things like map-reduce are also viewed as coming from "unparalleled Google capabilities".

Let's get real. Google is a big company that employs thousands upon thousands of overqualified Java and C++ programmers. They are a fat cat. Not necessarily a cunning and agile one.

With the amount of cash they have on hand, indeed they should be producing some interesting research.

But I have a hard time seeing things like map-reduce as state-of-the-art R&D.

That many programmers, who have standards that consistently hover around varying levels of mediocrity, are satisfied with Google's design choices does not necessarily make what they do "state of the art". It just makes it the most popular. (Popularity is of course very important, perhaps all-important, in this business, but has little to do with research and pushing the envelope.)

2 comments

I see you created a brand new account just to write this post.

MapReduce is the name of a piece of software, not simply the concept of map followed by reduce. That's something that every high school freshman invents on his own in Algebra 1 class. The interesting research area is making that concept scale to "run this command on every web page on the Internet" billions of times a day. I don't know about you, but I don't see anything to do that in my apt repositories.

Research at Google isn't about solving problems that are beyond the comprehension or reach of any average practitioner of programming. A good example is Street View. Anyone can understand strapping some cameras and sensors to a car and driving it around to make pictures of places available on the Internet. It's hard to call that "state of the art research" because it's such a simple idea, but before Google did the research, the product didn't exist. Now you can see almost any street address anywhere in the world in your browser. (The hard part is in the details. How do you map images to locations on the Earth in areas where GPS reception isn't good enough to provide enough accuracy? Scaling something to the entire Earth is not easy.)

That many programmers, who have standards that consistently hover around varying levels of mediocrity

How exactly did you come to this conclusion?

Right. I would say it's more about execution. For that Google gets full credit. And I do find the execution impressive.

The mediocrity line is my opinion. Not necessarily fact. Downvote me if you are offended.

Better yet, prove me wrong.

Better yet, prove me wrong.

First you say, all programmers at Google have standards that "hover around mediocrity". Then you say you're amazed at Google's execution on world-changing ideas. Do you see a contradiction?

Argument over, you can close your trolling account! :)

Can you show me where I said:

"all programmers at Google"

"amazed"

"world-changing ideas"

What I see are your words, not mine. Yet they are attributed to me.

If this is an "argument" as you suggest, then you are doing a poor job at making your case.

There is a difference between 1. "research" and developing "novel" ideas and 2. executing well on large projects. It's possible to accomplish 2. without invoking 1., and vice versa.

There is no contradiction.

When a programmer complains there's nothing in "apt" to solve his problem, I'm never impressed. Nor am I ever surprised. Convenience makes some programmers very lazy.

Isn't there a bridge you should be guarding?
Mapreduce as a concept goes beyond lisp implementations. On the surface it might seem like the point of mapreduce is expressing computations in terms of map and reduce functions. It isn't.

The point of mapreduce is reducing the problem of high-throughput fault-tolerant distributed systems to a very efficient and reliable distributed sorting algorithm (the shuffle phase, which is implemented by the implementations of mapreduce and not by the user code). If you can express all synchronization in your algorithm in terms of sorting, then whatever you do before sorting (map) or after it (reduce) is kind of trivial, as the hard part is taken care of by the framework.

This abstraction is novel, and profoundly useful, and that's the point of mapreduce, not so much the actual map() and reduce() functions.

Sorry, but applying an old concept to a new problem (actually just new buzzwords... it's only the size of the problem that's new) does not make a "novel" solution. Moreover, it's an obvious solution. But I guess that depends on who is doing the programming.

I would love to see how programmers with large clusters at their disposal were approaching large datasets before the moment they realized splitting the task into smaller pieces was what they should do.

It's not about splitting the task into smaller pieces. It's about factoring out the parts of the task that need synchronization among all machines into one specific subroutine (groupBy) which makes mapreduce so powerful.

If you speak with people experienced in multithreaded and distributed programming you will see that synchronization with fault-tolerance is _hard_, and mapreduce provides a widely-applicable set of sufficient conditions for an algorithm to be executable with implicit fault-tolerance and implicit synchronization.

Without mapreduce-like abstractions eveyr piece of software has to be responsible for its own (1) checkpointing (to recover from errors), (2) checksumming (to ensure that no errors happened), and (3) distributed communication (to make sure the global state becomes global and the local state becomes local).