Hacker News new | ask | show | jobs
by burningion 3993 days ago
I just started going down this path. I began with using audio analysis to do some machine learning. (Detecting a specific audio pattern very easily recognizable to humans). Can't get too specific about it, as it's under NDA. But I had a little under two weeks to get a prototype built that either proved or disproved it would be possible.

The very first thing I did was take a step back and understand the domain of the data I was working with, and what the best way to present it for machine learning would be. In my case, I had to understand what the best format for presenting my audio would be (slightly modified MFCCs), and what the best library would be to get my data in that format.

Next, I needed to build a data set of proper training data. This mean I had to manually build a (largish) data set that matched exactly what I was looking for. So I went and downloaded a bunch of example audio, and then manually went through it, tagging it into the two bins I was looking to differentiate against.

Once I had this, (which actually took much more time than the learning itself), I was ready to do the actual machine learning itself. I used Theano, and figuring out how to translate my dataset into a format digestible by Theano took another chunk of time. Once I had my data in the proper format for Theano, it came down to basically playing with how I presented my initial data to Theano, and then tweaking my gradient.

Finally, I was able to train and get a net that was about 80% right with my hypothesis. There were a few edge cases I hadn't anticipated that wouldn't necessarily work well, but it gave us enough confidence to go through with more machine learning for our project.

So, takeaway suggestions: find a real project, something you want to learn, and then just do it. Gather knowledge of your data, build a dataset, and test a hypothesis. Most of this isn't machine learning, it's mostly just moving and shaping data, and knowing what in your data is significant. The machine learning algorithms are really just a tiny piece of the whole picture. Good luck.