Hacker News new | ask | show | jobs
by enraged_camel 3520 days ago
As a side note, a lot of tutorials I've seen on machine learning use Python, and I'm curious as to why. Is it simply the number of libraries that have been developed for ML tasks, or is there something about Python the language that makes it especially suitable (versus, say, Ruby or Haskell).
4 comments

Disclaimer: I run a site discussing Python/ML topics as applied to quant finance.

Python is primarily used because the machine learning libraries within it are very mature and play nicely with each other.

It is easy to get started in Python (and most of its libraries) by downloading the freely-available Anaconda distribution. This usually "just works", cross-platform. The language itself is extremely straightforward to pick up.

Within the Python ecosystem there are many mature libraries. In particular NumPy was written for carrying out vectorised computation. This enabled more libraries, such as pandas (for dataframe manipulation), SciPy (for general scientific computation) and scikit-learn (for ML) to be developed. Each of these libraries also possess clean and consistent APIs for carrying out their specialty tasks.

Thus it becomes straightforward in Python to import data from many sources, "wrangle" it into the correct format (even with real-world, messy data), put it into an ML data pipeline and then visualise it easily (via Matplotlib or Seaborn). In addition there is Jupyter for straightforward "notebook" style research.

Finally, Theano and TensorFlow are two great deep learning libraries. There are a few hiccups on installation sometimes, but for the most part they "just work".

There are still some "missing pieces" however. The statsmodels library does a good job of time series analysis, but it doesn't yet compete fully with R in this respect.

Julia is also likely to make serious inroads into Python's usage in the near future. I'm excited about where the project is heading.

Python is easy to learn, a lot of people already know it, it has a ton of libraries, and is something used professionally in machine learning.
My understanding is that ML is largely driven by academics, not professional programmers, and as a result, they tend to gravitate to easy to understand languages like Python. A similar thing seems to have happened with Data Science, Statistics etc.
> is there something about Python ... that makes it especially suitable[?]

Yes. Python has the best collection of sufficiently user-friendly and fast modules for machine learning. Other languages tend to have fast, friendly, or many modules, but not all three. I suppose R is somewhat competitive on those aspects, but R isn't a great general-purpose language.