| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by placeybordeaux 3457 days ago
	The picture has now gotten a little fuzzier as this blog post conflates map reduce and YARN and calls them both hadoop. The scala pseudo code is just about exactly what you'd use with spark which runs on YARN.

1 comments

I think his point is that bloated, over-engineered Big Data systems—whether batch or streaming—are overkill for the vast majority of problems.

There are just many points that don't really apply to stuff like spark or tez that runs on YARN:

ex: Hadoop << SQL, Python Scripts

I completely agree with

Mapreduce << SQL, Python Scripts

I do a lot of my processing on sparkSQL and through RDD transformations as opposed to Mapreduce limiting, slow KV style processing.