|
|
|
|
|
by StreamBright
3363 days ago
|
|
Hadoop certified engineer here. I think Hadoop is losing its popularity or from a different point of view, it got 90% of its potential market saturated and having trouble entering other markets. The biggest challenges are operational stability and performance, and the lack of understanding from the Hadoop companies about the performance characteristics of their system. On the top of that there is always 2 version of everything (Tez vs Impala, ORC vs Parquet, etc.) because HWX and Cloudera cannot really work together in an opensource fashion. On the top of everything there are better products on the market for different use cases for Hadoop. The following list is incomplete: Alluxio, Apache Beam, Apache Kudu. These systems trying to address some of the aforementioned shortcomings of Hadoop. There are other products like PrestoDB that take a slightly different approach to a particular problem (accessing data via SQL like interface) and mix it with a extra goodness (in memory caching) and delivering an entirely different customer experience. If you leave Hadoop land you can also play with Spark or Storm (depending on your use case). Now that Facebook uses Spark there is a good chance that an average use won't be running into scaling issues with it. I left out products from vendors that target the same customers as Hadoop vendors on purpose. There are plenty of closed source solutions that will leave Hadoop in the dust in almost every aspect of big data processing (performance, security, UI, stability, availability, etc.). |
|
I agree Hadoop is no longer MapReduce. It's HDFS+YARN. That's it. Distributions package up Spark/Flink/Kafka/PrestoDB with the HDFS/YARN core.
At Hops, we've scaled the core HDFS by >16-37X ( https://blog.acolyer.org/2017/03/06/hopfs-scaling-hierarchic...) and we have a distribution called Hopsworks with support for Spark/Flink/Tensorflow. Nobody uses MapReduce on our platform.
The thing that has killed Hadoop, imo, is Kerberos. In Hops, we have switched to using TLS/SSL certificates instead of Kerberos, and that enables us to implement dynamic roles. Dynamic roles allows us to build a software-as-a-service platform, where projects are securely isolated from one another.