Hacker News new | ask | show | jobs
by nrh 3772 days ago
Spotifier here. This is an important point. I have nothing bad to say about Cloudera or HWX (disclaimer: we're an HWX customer - we've had a pretty good experience), but I don't really see a compelling reason at this stage to manage your own cluster(s) (HIPAA/regulatory constraints, maybe?)

Getting shared-storage and indepedently operated/scaled compute clusters on top of that storage isn't easily achievable with the standard Hadoop stack, and building that on top of HDFS is non-trivial.

1 comments

In fact, I don't think large orgs like you (Spotify) really want independently operated clusters. That prevents easy sharing of data, causing data silos to appear. You really want to have true multi-tenancy, which isn't in Hadoop yet. Hadoop has worked more on Kerberos support at the cost of features like easy-to-use access control - Apache Ranger or Sentry anybody!?!?
Yahoo's Hadoop clusters operate in a multi tenant fashion for precisely this reason: to ease sharing of data between groups.