|
|
|
|
|
by darose
3376 days ago
|
|
This really strikes me as more of a marketing piece by Snowflake than a well-researched piece of reporting. The article mostly just quotes one person - Bob Muglia - who is, as they say on Wall Street, "talking his book" - i.e. giving an opinion that is not coincidentally in line with his own financial interests. Sure, Hadoop is getting old, and is quickly becoming replaced by spark. But loads of organizations have used, and continue to use hadoop /spark successfully. And the part about Kafka replacing Hadoop /Spark is just silly. They're completely different technologies, used for very different purposes, and many organizations use both side by side. |
|
You are making a very common mistake of coupling the file system and the map reduce implementation and the scheduler . You are right about this post and kafka though. Let me expand on this point a bit more.
Hadoop isn't what it was in 2004. It's now a complex beast with several decoupled components which now make it very hard to identify what "hadoop" is for people outside the space.
The hadoop ecosystem is actually very healthy if you actually look at all the streaming platforms built on top of it (kafka, apex, spark, flink,tez,..)
There are also databases such as hbase,cassandra and more recently kudu, specialized for different workloads. Don't even get me started on all the sql implementations (again with their own trade offs) such as impala and hive.
If we step back for a second here and focus on just the compute part: Yes map reduce is for the most part dead. This is supplanted by the streaming and batch platforms such as flink and spark.
The scheduler part (YARN) is competing with mesos largely in part thanks to spark and flink being able to leverage both with mesos being way more flexible. (Most hadoop distros only use YARN though).
Then we also have the distributed consensus part in zookeeper. Etcd is up and coming in this piece but your hadoop cluster uses zookeeper (both mesos and kafka rely on it for example)