Hacker News new | ask | show | jobs
by hackinthebochs 4795 days ago
This looks awesome. I've been itching to try out some ideas I have after having gone through Bishop's book, but I've been hesitant to write the algorithms from scratch. Now I'll have to decide between learning matlab or a library such as this.
2 comments

You should definitely use a scriptable ML library. The process is very iterative and not suited to a compiled language like C++. I use skilearn alot, but also the matlab toolboxes or R are great. At its heart ML is alot of stats, so use something built for maths, not C++. It doesn't really make sense to break out C++ until you know exactly what algorithm and settings you need and your application is real time.
I am very agreed with this. I had even been thinking in making a DSL (Scheme based probably) oriented to ML instead of a library. I would found that more useful in the exploring phase.
If the OP was thinking of writing his own algorithms, and this is a linkable library with that heavy math already implemented, couldn't he write bindings for Python/Lua/Tcl/Ruby and have everything he needs for script-ability, or am I missing something?
You aren't missing anything, you are absolutely correct, but the question isn't "can they?" the question is "will they?"
I'm in the Stanford/Coursera machine learning course right now, and something like this is nearly excactly what I've been looking for.

As some others have said, GPLv3 is off-putting, but there is the LGPL mlpack lib (http://www.mlpack.org/) (also C++). Personally, project-wise, the only way this could be improved is if the project were pure C, and a BSD, MIT, or similar license. Quite looking forward to checking these out, though.

honestly you guys are crazy if you think shark is gonna help you learn machine learning. Its ideal for deployment of ML on things like embedded computer, robotics, games etc. where real time learning is required. Machine learning requires alot of experimentation and C++ is a terrible medium for that. There are loads of good machine learning libraries implemented for python and matlab. Pretty much every good paper in machine learning is accompanied by an algorithm implemented in matlab or python or R. learn using those reference designs. Once you figured out what you want, then deploy on a system in C++ by all means using shark. I do robotics for a living, and I do go from scripting to C++. Unless its absolutely necessary I avoid C++. Only things like vision which is so CPU hungry that it has its own computer do I require C++, every other algorithm stays in python.
Actually you forget that performance when you need to train for days at a time is critical, if I use Octave/Matlab/R my current project might take months to train instead of weeks. All my ML code is high performance threaded C++. I recommend you use a good template linear algebra library like Eigen, you can do plenty of experimentation in C++. I find with a set of a few modern libraries and the required experience a C++ programmer is just as if not more efficient than a Python/R/Matlab programmer. It comes down to the skill of the programmer and the proper choice of libraries.
Speaking as a PhD student in machine learning-

Implement the algorithm yourself, first, in Python+Numpy. The only reason I feel comfortable with Gaussian Processes and SVMs is due to writing code to solve them manually.

Once you're happy with the basics, and can test your ideas with code you intimately understand, optimise for speed by using a library like this.

Implementing the SVM from scratch was time consuming - no?
The only tricky part would be writing a quadratic solver. Alternatives: either solve a linear SVM using gradient descent (simpler to write), or offload the core of the algorithm to an existing solver like cvxopt.

edit: For an example of using cvxopt, check out http://www.mblondel.org/journal/2010/09/19/support-vector-ma...

Another approach is to implement Platt's SMO:

http://en.wikipedia.org/wiki/Sequential_minimal_optimization

cool - thanks
Yes it was :) but time well spent imo.

On reflection I guess I might have had more free time to spend on this than a normal person - I did the SVM as a [small] part of my masters project, so if you're time constrained with a real job and a life then might be best to disregard me.

If you had the quadratic solver, I would think it would be reasonable to add the rest of the code. If you started adding costs, gammas, etc. I would think it would take a while. I spent hours looking at the source code of libSVM at my last job and never really understood what the hell was going on
I do agree with you regardless, just curious