| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by suresk 3472 days ago
	I have played around with scikit-learn and love how simple and easy it is to work with, but the story for scaling it doesn't seem super straightforward - is this something anyone here has experience with? I built a recommendation system in Spark earlier this year that used terabytes of input and would run it on a 40 node EMR cluster so it took less than half an hour. It wasn't trivial to make it run in a clustered environment, but it wasn't very hard either.

1 comments

Out of curiosity, were you using spark-scala or pyspark?

I was using scala