Hacker News new | ask | show | jobs
by darose 3376 days ago
This really strikes me as more of a marketing piece by Snowflake than a well-researched piece of reporting. The article mostly just quotes one person - Bob Muglia - who is, as they say on Wall Street, "talking his book" - i.e. giving an opinion that is not coincidentally in line with his own financial interests. Sure, Hadoop is getting old, and is quickly becoming replaced by spark. But loads of organizations have used, and continue to use hadoop /spark successfully. And the part about Kafka replacing Hadoop /Spark is just silly. They're completely different technologies, used for very different purposes, and many organizations use both side by side.
2 comments

Disclaimer: I run a technology vendor partially invested in the success of the hadoop ecosystem.

You are making a very common mistake of coupling the file system and the map reduce implementation and the scheduler . You are right about this post and kafka though. Let me expand on this point a bit more.

Hadoop isn't what it was in 2004. It's now a complex beast with several decoupled components which now make it very hard to identify what "hadoop" is for people outside the space.

The hadoop ecosystem is actually very healthy if you actually look at all the streaming platforms built on top of it (kafka, apex, spark, flink,tez,..)

There are also databases such as hbase,cassandra and more recently kudu, specialized for different workloads. Don't even get me started on all the sql implementations (again with their own trade offs) such as impala and hive.

If we step back for a second here and focus on just the compute part: Yes map reduce is for the most part dead. This is supplanted by the streaming and batch platforms such as flink and spark.

The scheduler part (YARN) is competing with mesos largely in part thanks to spark and flink being able to leverage both with mesos being way more flexible. (Most hadoop distros only use YARN though).

Then we also have the distributed consensus part in zookeeper. Etcd is up and coming in this piece but your hadoop cluster uses zookeeper (both mesos and kafka rely on it for example)

The article also quotes Bobby Johnson, who helped run Facebook's Hadoop cluster, as well as the creator of Kafka (who ran Hadoop clusters at LinkedIn).

For what it's worth, all three of them seemed pretty down on Hadoop.

I think the parent is right though. Side topic but having seen how PR pieces are crafted, this feels like something that Snowflake put together and then passed on to datanami with a "we have a blog post we'd like you to publish" type mail. Claim is somewhat unsubstantiated but everything about it reeks of it trying to drive the person to discover of Snowflake at the start, and to think of it again at the end.

A quick search of hadoop against the snowflake domain and the term hadoop against the term snowflake, I keep finding that Snowflake has a definite Target in mind which is to convert hadoop users or people evaluating hadoop to choose them instead. They even have a webinar specifically for that segment of people.

Even further searching of Alex Woodie and mentions of snowflake show multiple articles with the CEO across multiple domains including datanami and Enterprise tech.

All that is circumstantial but I'm exercising a healthy bit of skepticism that this piece is pure research done by Alex Woodie. A little more objectively,

If I examine the "points" of the article, what I can see is:

Bob muglia has never met a happy hadoop customer. Mention couple of things that might replace hadoop in the future.

Bob muglia has only seen a few customers who've tamed hadoop.

Some discussions with and about Facebook's experience with hadoop painting hadoop as hard work from the outset.

More discussions with other tech folk (Kafka and data torrent). One is an alternative of sorts, and the other again discusses pain of hadoop.

And then back to Bob Muglia and who his target customers are for Snowflake - "hadoop refugees" - and his belief that we are in the valley of despair regarding hadoop.

Which brings us to the final mental point of the article. Ditch hadoop sooner rather than later, and here are the alternatives where the main one pushed from start to end is Snowflake.

I apologise if this was too far off the topic. I think the discussion of hadoop's validity or how it's being used is valid. I do also believe it's healthy to call out suspect stuff like this because the core of the article itself provides little to no critical value.