|
|
|
|
|
by pixelmonkey
2194 days ago
|
|
Michael Stonebraker has an interesting set of conclusions in his assessment of the MapReduce vendor market in 2015 from the "Dataflow" chapter here: "- Just because Google thinks something is a good idea does
not mean you should adopt it. - Disbelieve all marketing spin, and figure out what benefit
any given product actually has. This should be especially
applied to performance claims. - The community of programmers has a love affair with “the
next shiny object”. This is likely to create “churn” in your
organization, as the “half-life” of shiny objects may be quite
short." |
|
Map() is not equivalent to a SQL GROUP BY clause, it is equivalent to a user-defined Table Function that is used in a FROM clause. This mimics the Extract and Transform stages in a SQL ETL pipeline. The Extract is implied by the input format.
The Reduce() is very much equivalent to a user-defined Aggregate Function. D&S accurately criticize the sub-optimal materialization of intermediate data sets but they under appreciate the implicit input split and distributed sorting mechanism which dominated the Terasort benchmark at the time (a Jim Gray creation).
On-Premise commodity Hadoop clusters lost out to public Infrastructure-as-a-Service clusters. None of the five takedown categories turned out to be important. The tools have evolved and cloud-native data warehouses and ETL systems are now the best of both worlds.
[1] https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_...