I'm surprised hadoop is at the top for distributed processing. I don't imagine many businesses really actually want hadoop. Anybody here using it as part of their stack and can justify its use?
I don't imagine many businesses really actually want hadoop.
I imagine quite a few do. They may not be using the map/reduce API (although there are almost certainly use cases where that makes sense too), but HDFS and Yarn are pretty ubiquitous.
I assume the main reason people are using hadoop these days is for HDFS. Spark has supplanted it for actual processing. So the reason hadoop could be so high is that when organizations use part of the hadoop ecosystem (like Cassandra, Hive, and of course Spark) they mention hadoop along with it. Then even though not many organizations are using vanilla hadoop, since it's used in conjunction with many other technologies, it dominates the list.
I imagine quite a few do. They may not be using the map/reduce API (although there are almost certainly use cases where that makes sense too), but HDFS and Yarn are pretty ubiquitous.