|
|
|
|
|
by nautilus12
2179 days ago
|
|
Im going to give you my slightly biased and annoyed answer. It seems like people that use python tend to look down on spark as "too complicated" being written in Scala. I come from Scala background and now feeling forced into using python for my data work due to the momentum it has now I am still amazed at how quickly some simple requests like using a different image or having to attach some jars can make python people be like "whooa that's complicated, how can anyone like spark." Personally I love spark (for all it's quirks),and I think that the spark dataframe is much more mature in many ways to pandas, and the sanity type driven programming brings to table, and im kind of sad that im probably going to have to use python the rest of my career because there are so many fires it causes and a real strong tendency to kick many things down the road. The community just generally strikes me as very impatient. |
|
Spark is ace as it has an SQL API available cross-language, which makes ETL much more effective, and ML models (though I've always been sort-of suspicious about their maturity).
tl;dr - demonstrate the speed of running regressions in Spark, and many (most) data scientists will invest the time in learning the tool.