| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mystique 4100 days ago

MapReduce and RDBMS are apples and oranges - both are good at what they do and are effective within their own use cases. One allows you to handle any type of data and manage it whichever way, another allows you to understand your data if you can live within some defined structure. It is silly to suggest to use MapReduce to power a dashboard with sub second response time. Same way, it is silly to suggest using MPP or RDBMS like techniques for processing highly unstructured or even semi structured content.

Apache Spark is getting close to being able to do both, but still as a developer building a data stack, I would not inspect terabytes of data every single time if 80% of questions can be answered by looking at data once and saving summarized results in relational format.

I thought Hadoop vs RDBMS was a fight settled may be 4-5 years ago! Amusing to see it being raised at this time.

1 comments

JakaJancar 4100 days ago

We have stored many terabytes of unaggregated transaction records in Vertica and analyzed large subsets fully ad-hoc, on-the-fly in less than 5s. We have also used BigQuery to directly power dashboards with few-second response times, also analyzing billions of records at a time.

On the other hand, we both build and use summary tables with Spark. (in a relational format to boot, and using Spark SQL).

I think you would benefit from re-evaluating the assumptions you made 4-5 years ago.

link