Hacker News new | ask | show | jobs
by nervechannel 5577 days ago
It's actually pretty debatable whether this year's KDD Cup will really help the science of music recommendation:

http://musicmachinery.com/2011/02/22/is-the-kdd-cup-really-m...

Because it's entirely anonymised, not just the users but the artists too -- c.f. Netflix's problems with deanonymization:

http://33bits.org/2010/03/15/open-letter-to-netflix/

This means you can't use any interesting characteristics of the music itself, or the associated metadata, to aid the recommendations. All the interesting domain knowledge is stripped out, which likely means the best solutions still won't work as well as algorithms that use metadata (like Last.fm's) or content analysis (like Pandora's) or both, and certainly won't lead to any particularly interesting insights about what drives people's tastes.

Disclaimer: I work at Last.fm

1 comments

Very interesting; thanks. I had only taken a cursory look at the KDD cup page.

I didn't know the dataset was crippled, because I doubt the netflix attack would work with music.. there's no IMDB for music that acts as an independent dataset. Unless they intersect the set with last.fm, of course :)