| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dmayle 293 days ago
	As to MapReduce, I think you're fundamentally mistaken. You can talk about map and reduce in the lambda calculus sense of the term, but in terms of high performance distributed calculations, MapReduce was definitely invented at Google (by Jeff Dean and Sanjay Ghemawat in 2004).

3 comments

jonathaneunice 293 days ago

Not quite. Google brilliantly rebranded the work of John McCarthy, C.A.R. Hoare, Guy Steele, _et al_ from 1960 ff. e.g. https://dl.acm.org/doi/10.1145/367177.367199

Dean, Ghemawat, and Google at large deserve credit not for inventing map and reduce—those were already canonical in programming languages and parallel algorithm theory—but for reframing them in the early 2000s against the reality of extraordinarily large, scale-out distributed networks.

Earlier takes on these primitives had been about generalizing symbolic computation or squeezing algorithms into environments of extreme resource scarcity. The 2004 MapReduce paper was also about scarcity—but scarcity redefined, at the scale of global workloads and thousands of commodity machines. That reframing was the true innovation.

link

dekhn 293 days ago

CERN was doing the equivalent of MapReduce before Google existed.

link

wbl 293 days ago

Got a link? HEP data is really about triggers and widely distributed screening that's application specific AFAIK.

link

dekhn 293 days ago

My main reference is the head of computing at CERN, who explained this to me. He gave some early examples of ROOT (https://en.wikipedia.org/wiki/ROOT) using parallel processing of the ROOT equivalent of SSTables.

link

mr_toad 292 days ago

SPMD was definitely a thing before MapReduce.

link