Hacker News new | ask | show | jobs
by marssaxman 2951 days ago
I've been working on a tool for analyzing my DJ track library, with the goal of doing a little semi-supervised learning to help organize my crates, remind me about music that might be getting lost in the shuffle, and - eventually, if all goes well - automatically scan beatport, soundcloud, etc for new tracks and notify me about music that might fit into my library. It doesn't actually do anything useful yet, but even if it never comes together the way I'm imagining, it's already been a great ML learning experience:

http://www.github.com/marssaxman/robocrate

1 comments

That sounds like a great idea. I used to have a decent vinyl collection, and would often forget about some of the good ones I have after not playing them for a while.

There isn't any info on the github page. Could you describe a bit how exactly does it organize the music?

There's no info yet partly because I've been trying a variety of approaches, and I'm not sure yet which approach will work out best. The core of the tool is a scanner which extracts an audio feature vector for each track in your library. Armed with this feature matrix, we can apply clustering algorithms - the most successful so far has been a Gaussian mixture model. I'm currently working on a system which will hopefully improve accuracy by bootstrapping a feature selection model, using metadata tags as an initial ground truth for music similarity, thereby allowing us to reduce the actual number of features which need to be compared.

I started out imagining that this tool would continuously update the contents of my crates, but now I think I want it to be more of a manual process. I'm imagining it as an analysis and reporting tool more than an organizer; I'll ask it to identify the outliers in a given crate, or ask for suggested additions, then choose how to arrange things myself. This way, I can use the manually-curated organization of my library as additional training data for the similarity model.