Hacker News new | ask | show | jobs
by greenonion 4932 days ago
So is there anyone using Python for machine learning in production systems (i.e. not just for prototyping). I would love to do it but seems Java/Mahout is a safer choice, performance-wise.

I wonder whether Blaze is a step towards that direction.

2 comments

I use Python for nearly all of my ETL processes that involve text processing. Even in production systems, I'd be hard-pressed to admit any significant performance issues. Python facilitates implementing algorithms in a functional style, which I tend to prefer over the imperative style (i.e., Java). With C++11 and boost, I'm able to translate my Python code to C++ while preserving the functional style, which has immensely simplified prototyping/deploying NLP/ML algorithms while simultaneously begetting enormous performance gains. I see Python as an extremely viable alternative to Java.
You got me a bit confused here. If I understand correctly what you 're saying, you 're still using Python for prototyping the core algorithms and C++ in actual production systems. I'm not saying Python is not good for production systems in general, I'm wondering whether it is good enough for real-world implementations of machine learning algorithms.

Also, I believe most people would consider Java as an alternative to C++, hence all the Java-based Apache projects, such as Mahout, Solr etc.

I use Python in production for text pre-processing and other ETL-related processes, which is part of a larger reinforcement learning approach. Additionally, I use Python to prototype the core ML algorithms, which I sometimes re-implement in C++. However, for many of those algorithms, numpy actually performs identically to BLAS in C++.
I get it now, thanks. It's very interesting, maybe I will give Python for ML a chance!
Have you tried Scala? It might let you write in a functional style and then not have to translate it to something else. Please don't interpret this as a troll; I'm genuinely curious what the pros/cons of these approaches are.
I've never tried Scala, but I suppose I should give it a chance. I'm a fan of Lisp, and the two languages seem to have a lot in common. Scala's expressive type system seems like it has the potential to be both a blessing and a curse, but admittedly, I know next to nothing about the language.
I may be missing something here, but if you're a fan of lisp and want easy interaction with libraries on the JVM, please tell me you've heard of Clojure. It's a modern lisp that strongly favors functional programming, and that has great concurrency support. Plus, there is already a data analysis / statistical platform built on top of it called Incanter.
We also use python in production at plotwatt for machine learning. We started by prototyping in matlab and then porting to c++, but have since found it much much easier to just do everything in python and numpy. When speed was an issue, we slightly changed the way we implemented the algorithm rather than implement the same algorithm in a faster language. Admittedly this isn't always possible.