Hacker News new | ask | show | jobs
by rjurney 4134 days ago
Spark is less mature than Hadoop, so you will run into issues like this. In my experience, advocating for the bug to get fixed often results in it getting fixed... on a several month timeline. This happened with Avro support in Python. I advocated for the patch and someone supplied it in the next version of Spark.

Lemme tell you though... as someone that has use Hadoop for 5+ years... not waiting 5-10 minutes every time you run new code is worth the trouble. Despite more problems owing to immaturity, or just Spark 'doing less for you' in terms of data validation than other tools like Pig/Hive, if you can get your stuff running on Spark... development is joyous. You just don't have to wait very long during development anymore.

I feel like 5 years of my life were delayed 10 minutes. That did terrible things to my coding that I'm just starting to get over. With Spark I am 10x as productive, and I am limited by my thinking, not the tools.

PySpark in particular is really great.

1 comments

Seriously ./spark-shell is a godsend for development.

And I love the fact you can press Tab and get autocompletion of methods.