Hacker News new | ask | show | jobs
by suresk 3472 days ago
I have played around with scikit-learn and love how simple and easy it is to work with, but the story for scaling it doesn't seem super straightforward - is this something anyone here has experience with?

I built a recommendation system in Spark earlier this year that used terabytes of input and would run it on a 40 node EMR cluster so it took less than half an hour. It wasn't trivial to make it run in a clustered environment, but it wasn't very hard either.

1 comments

Out of curiosity, were you using spark-scala or pyspark?
I was using scala