|
|
|
|
|
by super_sloth
4134 days ago
|
|
Has anyone had some good experiences with Spark? I put several weeks in to moving our machine learning pipeline over to Spark only to find I kept hitting a race condition in their scheduler. After doing a bit of searching, it seems this is actually a known issue https://issues.apache.org/jira/browse/SPARK-4454 and there's been a fix on their github for a while: https://github.com/apache/spark/pull/3345 and yet in that time two releases have swung by bringing a tonne of features. I ended up having to drop Spark ultimately because I wasn't confident about putting it in to production (the random OOMs and NPEs during development weren't great either). Does anyone have any positive experiences? |
|
Lemme tell you though... as someone that has use Hadoop for 5+ years... not waiting 5-10 minutes every time you run new code is worth the trouble. Despite more problems owing to immaturity, or just Spark 'doing less for you' in terms of data validation than other tools like Pig/Hive, if you can get your stuff running on Spark... development is joyous. You just don't have to wait very long during development anymore.
I feel like 5 years of my life were delayed 10 minutes. That did terrible things to my coding that I'm just starting to get over. With Spark I am 10x as productive, and I am limited by my thinking, not the tools.
PySpark in particular is really great.