Hacker News new | ask | show | jobs
by disgruntledphd2 2179 days ago
Spark is really, really good. It's a massive leap from the Python/R model of play around with a data.frame till I have a model, then wrap it up in a script for a lot of data scientists though, which causes problems.

Spark is ace as it has an SQL API available cross-language, which makes ETL much more effective, and ML models (though I've always been sort-of suspicious about their maturity).

tl;dr - demonstrate the speed of running regressions in Spark, and many (most) data scientists will invest the time in learning the tool.