Hacker News new | ask | show | jobs
by tlarkworthy 4795 days ago
You should definitely use a scriptable ML library. The process is very iterative and not suited to a compiled language like C++. I use skilearn alot, but also the matlab toolboxes or R are great. At its heart ML is alot of stats, so use something built for maths, not C++. It doesn't really make sense to break out C++ until you know exactly what algorithm and settings you need and your application is real time.
2 comments

I am very agreed with this. I had even been thinking in making a DSL (Scheme based probably) oriented to ML instead of a library. I would found that more useful in the exploring phase.
If the OP was thinking of writing his own algorithms, and this is a linkable library with that heavy math already implemented, couldn't he write bindings for Python/Lua/Tcl/Ruby and have everything he needs for script-ability, or am I missing something?
You aren't missing anything, you are absolutely correct, but the question isn't "can they?" the question is "will they?"
I'm in the Stanford/Coursera machine learning course right now, and something like this is nearly excactly what I've been looking for.

As some others have said, GPLv3 is off-putting, but there is the LGPL mlpack lib (http://www.mlpack.org/) (also C++). Personally, project-wise, the only way this could be improved is if the project were pure C, and a BSD, MIT, or similar license. Quite looking forward to checking these out, though.

honestly you guys are crazy if you think shark is gonna help you learn machine learning. Its ideal for deployment of ML on things like embedded computer, robotics, games etc. where real time learning is required. Machine learning requires alot of experimentation and C++ is a terrible medium for that. There are loads of good machine learning libraries implemented for python and matlab. Pretty much every good paper in machine learning is accompanied by an algorithm implemented in matlab or python or R. learn using those reference designs. Once you figured out what you want, then deploy on a system in C++ by all means using shark. I do robotics for a living, and I do go from scripting to C++. Unless its absolutely necessary I avoid C++. Only things like vision which is so CPU hungry that it has its own computer do I require C++, every other algorithm stays in python.
Actually you forget that performance when you need to train for days at a time is critical, if I use Octave/Matlab/R my current project might take months to train instead of weeks. All my ML code is high performance threaded C++. I recommend you use a good template linear algebra library like Eigen, you can do plenty of experimentation in C++. I find with a set of a few modern libraries and the required experience a C++ programmer is just as if not more efficient than a Python/R/Matlab programmer. It comes down to the skill of the programmer and the proper choice of libraries.
True that matlab octave and R are all rubbish for performance. I use python + numpy which all delegates to BLAS for the hardcore linear algebra stuff. I don't normally find C++ gains me all that much. You can also do GPU acceleration pretty easy using theano (e.g. http://deeplearning.net/software/theano/tutorial/using_gpu.h...)

So I reckon my GPU accelerated python still beats a C++ pthreads approach, and is alot faster to develop on.

Your mileage may vary, from what you said you probably know what you are doing, maybe GPU is not applicable. I was really replying to the initial comments that said they want to start learning machine learning on a C++ system. Training for days suggests you are doing something hardcore like MCMC/DBN/Guassian Processes, learners should not start there though....