| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zippy5 1814 days ago

My interpretation is that is why it’s so brilliant.

It’s incredibly simple for the end user conceptually but encapsulates optimizing processing across a distributed file system, fault tolerance, shuffling key value pairs, job stage planning, handling intermediates ect.

Hadoop a big data framework that reduces the level of competence required to write data pipelines because it was able to hide a massive amount of complexity behind the map reduce abstraction.

Id even argue that hive, snowflake, and other sql data warehouses have taken this idea further, where most sql primitives can be implemented as map reduce derivatives. With this next level of abstraction, dbas and non-engineers are witting map reduce computations.

I think my point is that abstractions like map reduce have had a democratizing effect on who can implement high scale data processing and their value is that they took something incredibly complex and made it simple.

2 comments

psfried 1814 days ago

I agree with this. As soon as the MapReduce paper came out, people were criticizing it for a lack of novelty, claiming that so-and-so has been using these same techniques for years. And of course those critics are still around saying the same things. But I think there's a reason we keep going back to these techniques, and I think it's because they repeatedly prove to be practical and effective.

link

willvarfar 1814 days ago

It reminds me more of timescale’s continuous aggregates and the new snowflake slayer, firebolt’s, aggregation indexes.

link