| The strength of Hadoop isn't so much speed but that it's been around and there is a pretty impressive and fairly mature set of projects that comprises the Hadoop ecosystem, from Yarn to Hive, etc. There are still many issues to resolve, and this evolution will continue for decades to come. The TB sort benchmark is pretty useless to me - I am much more concerned with stability, a vibrant community (which means people, the software they write and institutions using Hadoop in production). Last time I tinkered with Spark (this was over a year ago) it was so buggy, next to useless, but perhaps things have changed. Still - the idea that there is some sort of a revolutionary new approach that is paradigm-shifting and is way better than anything before should be viewed with extreme skepticism. The problem of distributed computing is not a simple one. I remember tinkering with the Linux kernel back in the mid nineties, and 20 years later it still has ways to go to improve. Twenty years from now it might or might not be Hadoop that is the tool for this sort of thing, we don't know, but I will not take seriously anything or anyone who claims that the "next best thing" is here in 2014. |
2. Yes, Spark was/is buggy.
3. For me Spark is really paradigm shift, next generation framework compared to M/R