Hacker News new | ask | show | jobs
by timtadh 3593 days ago
Don't get me started on setting up production hdfs and hadoop that is a nightmare. It is comparatively well documented with lots of people running it! As soon as you get into the weeds with kerberos and HA mode forget about the documentation explaining anything properly. Cargo culting from random blog posts, reading the source, and just playing around with config files is the name of the game. There are some weird interactions between KDC settings and some of the daemons that are not documented at all.
4 comments

I'm happy to see marcoceppi mentioning juju here - i'm one of the enablers of juju big data.

We've worked really hard to make it simple to stand up hadoop on clouds, containers, and metal (https://jujucharms.com/hadoop-processing/). Juju brings the modeling, Bigtop brings the core apps. Scaling, observing, and integrating are old news; HA is landing now; your post and others like it have put security on our -next radar.

Having read a few kdc/hdfs stories, i think i'm going to miss the days when dfs.permissions.enabled was good enough ;)

Spot on! Once you get everything running you're so exhausted documenting it by writing a blog post is not on your mind, sleeping for a week is...
Oh God yes, this nonsense is most of my life - kerberos, ad, and my favourite un covered area of enterprise integration, storage. Nfs4 plus Kerberos anyone?
This has been my experience with Spark too! On one hand, its exciting working on new things changing so fast that the "best way" isn't common knowledge yet. On the other hand, having to grep through source to find out what a config option really does is just painful.
Have you tried to juju to set up things like hdfs and hadoop and the things that plug into it?