|
|
|
|
|
by disgruntledphd2
2179 days ago
|
|
Spark is really, really good. It's a massive leap from the Python/R model of play around with a data.frame till I have a model, then wrap it up in a script for a lot of data scientists though, which causes problems. Spark is ace as it has an SQL API available cross-language, which makes ETL much more effective, and ML models (though I've always been sort-of suspicious about their maturity). tl;dr - demonstrate the speed of running regressions in Spark, and many (most) data scientists will invest the time in learning the tool. |
|