| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by andrewmccall 5308 days ago

For raw storage Hadoop beats an RDBMS, sure I'll buy that argument. It's not the same thing though and it doesn't do the same job.

Hadoop excels at data processing, trawling vast quantities of unstructured or semi structured data and extracting information from it. It's a poor platform for random access to specific elements of that data though.

RDBMS are great in exactly the places Hadoop isn't, getting access to random elements of data in a structured manner. Executing structured queries on that data. Things you know you'll do a lot of and can optimise.

There are column and table data stores built on top of Hadoop, and it can be argued that they could be used as an alternative to an RDBMS but they aren't drop in replacements and for the most part they're not meant to do the same job.

The most interesting uses of Hadoop aren't going to come from replacing existing RDBMS infrastructure with a Hadoop cluster. They're going to come from pushing data into a Hadoop cluster to process it. Collecting data that would otherwise be impossible to collect because it's either unstructured or there is simply too much to put in a RDBMS at a cost effective scale.

Hadoop and the NoSQL movement is exciting when you start to think about processing that data and pulling what's useful back out into your existing infrastructure.