Hacker News new | ask | show | jobs
by dikei 876 days ago
MapReduce was basically a very verbose/imperative way to perform scalable, larger than memory aggregate-by-key operation.

It was necessary as a first step, but as soon as we had better abstraction, everyone stopped using it directly except for legacy maintenance of course.

2 comments

The abstraction came first. MapReduce was quickly used as a basis for larger-than-machine SQL (Google Dremel and Hadoop Pig). MapReduce was separately useful when the processing pieces require a lot of custom code that doesn't fit well into SQL (because you have hierarchical records, not purely relational, for example)
Can you point, please, to the better abstractions?
SQL comes to mind.

Every time you run an SQL query on BigQuery, for example, you are executing those same fundamental map shuffle primitives on underlying data, it's just that the interface is very different.