Hacker News new | ask | show | jobs
by andrewguenther 4795 days ago
You aren't missing anything, you are absolutely correct, but the question isn't "can they?" the question is "will they?"
1 comments

I'm in the Stanford/Coursera machine learning course right now, and something like this is nearly excactly what I've been looking for.

As some others have said, GPLv3 is off-putting, but there is the LGPL mlpack lib (http://www.mlpack.org/) (also C++). Personally, project-wise, the only way this could be improved is if the project were pure C, and a BSD, MIT, or similar license. Quite looking forward to checking these out, though.

honestly you guys are crazy if you think shark is gonna help you learn machine learning. Its ideal for deployment of ML on things like embedded computer, robotics, games etc. where real time learning is required. Machine learning requires alot of experimentation and C++ is a terrible medium for that. There are loads of good machine learning libraries implemented for python and matlab. Pretty much every good paper in machine learning is accompanied by an algorithm implemented in matlab or python or R. learn using those reference designs. Once you figured out what you want, then deploy on a system in C++ by all means using shark. I do robotics for a living, and I do go from scripting to C++. Unless its absolutely necessary I avoid C++. Only things like vision which is so CPU hungry that it has its own computer do I require C++, every other algorithm stays in python.
Actually you forget that performance when you need to train for days at a time is critical, if I use Octave/Matlab/R my current project might take months to train instead of weeks. All my ML code is high performance threaded C++. I recommend you use a good template linear algebra library like Eigen, you can do plenty of experimentation in C++. I find with a set of a few modern libraries and the required experience a C++ programmer is just as if not more efficient than a Python/R/Matlab programmer. It comes down to the skill of the programmer and the proper choice of libraries.
True that matlab octave and R are all rubbish for performance. I use python + numpy which all delegates to BLAS for the hardcore linear algebra stuff. I don't normally find C++ gains me all that much. You can also do GPU acceleration pretty easy using theano (e.g. http://deeplearning.net/software/theano/tutorial/using_gpu.h...)

So I reckon my GPU accelerated python still beats a C++ pthreads approach, and is alot faster to develop on.

Your mileage may vary, from what you said you probably know what you are doing, maybe GPU is not applicable. I was really replying to the initial comments that said they want to start learning machine learning on a C++ system. Training for days suggests you are doing something hardcore like MCMC/DBN/Guassian Processes, learners should not start there though....

I'm doing deep belief networks with dropout, and don't have access to GPU's with good double precision performance. I used to write graphics device drivers, so GPU computing has a special place in my heart and definitely agree with you there performance wise. It is funny though that my little laptop is hitting training times similar to some papers where people are using low end GPU's though, its amazing what you can do when you pay attention to performance.

I suspect my tuned C++ code will work quite well on a Intel MIC, and that is probably where I'm going to go when I have more resources to throw at the problem. I do know that Theano does use Alex's C++ CUDA code under the covers and I have done lots of reading of some of theano's code looking at implementation details to help developing my code. I just am not a big python (or most scripting languages actually) fan, perhaps I'm just too old school and written C, C++, C# and Java too long. If it doesn't smell or feel like C, I feel like Scotty in Star Trek 4 when he was making the transparent aluminum on the mac.