Hacker News new | ask | show | jobs
by joshvm 3472 days ago
There is a pro for Python in that it makes machine learning really easy, or at least incredibly accessible. You can write and test classifiers in about 5 lines of Python using scikit-learn. The second point is that virtually all the latest deep learning packages come with Python frontends by default nowadays. For stats you could also use SPSS.

The other advantage of Python is that as a scripting language it's very powerful for data wrangling and pre-processing, without needing all the boilerplate that e.g. C++ would require.

3 comments

Not to join a flame war, but R makes it pretty easy to test multiple models on a single dataset as well. I have also noticed it does better stats and missing data handling out of the box.
I have played around with scikit-learn and love how simple and easy it is to work with, but the story for scaling it doesn't seem super straightforward - is this something anyone here has experience with?

I built a recommendation system in Spark earlier this year that used terabytes of input and would run it on a 40 node EMR cluster so it took less than half an hour. It wasn't trivial to make it run in a clustered environment, but it wasn't very hard either.

Out of curiosity, were you using spark-scala or pyspark?
I was using scala
If you consider SPSS as an alternative, you'll probably really have no use for R. I agree that Python is more approachable for people with a CS background (unless your fan of array processing languages) but R actually is a nice language for data centric tasks.