Hacker News new | ask | show | jobs
by dmayle 293 days ago
As to MapReduce, I think you're fundamentally mistaken. You can talk about map and reduce in the lambda calculus sense of the term, but in terms of high performance distributed calculations, MapReduce was definitely invented at Google (by Jeff Dean and Sanjay Ghemawat in 2004).
3 comments

Not quite. Google brilliantly rebranded the work of John McCarthy, C.A.R. Hoare, Guy Steele, _et al_ from 1960 ff. e.g. https://dl.acm.org/doi/10.1145/367177.367199

Dean, Ghemawat, and Google at large deserve credit not for inventing map and reduce—those were already canonical in programming languages and parallel algorithm theory—but for reframing them in the early 2000s against the reality of extraordinarily large, scale-out distributed networks.

Earlier takes on these primitives had been about generalizing symbolic computation or squeezing algorithms into environments of extreme resource scarcity. The 2004 MapReduce paper was also about scarcity—but scarcity redefined, at the scale of global workloads and thousands of commodity machines. That reframing was the true innovation.

CERN was doing the equivalent of MapReduce before Google existed.
Got a link? HEP data is really about triggers and widely distributed screening that's application specific AFAIK.
My main reference is the head of computing at CERN, who explained this to me. He gave some early examples of ROOT (https://en.wikipedia.org/wiki/ROOT) using parallel processing of the ROOT equivalent of SSTables.
SPMD was definitely a thing before MapReduce.