Hacker News new | ask | show | jobs
by rm999 3629 days ago
I don't get the innovation in this paper - are they just running word2vec on groups of items? If so, Spotify has been doing this on playlists for years now: https://erikbern.com/2013/11/02/model-benchmarks/

Also, I know the paper isn't claiming state-of-the-art, but their SVD results are horrendous. Standard CF would create much better artist-artist pairings with even a medium sized dataset.

As an aside, I've run some quantitative and qualitative tests and have found the best recommendations come from a combination of user-item and item-item. I co-gave a talk at the NYC machine learning meetup recently (https://docs.google.com/presentation/d/1S5Cizi9LFQ7l0bMYtY7g...) that shows how this can work, starting at slide 20. The idea is to create a candidate list of matches using item-item, and then reorder using item-user. I've found this creates "sensible" suggestions using item-item, but truly personalizes when re-ordering. You can remove obvious recommendations by removing popular matches or matches the user has already interacted with (I consider this a business decision rather than something inherent in the algorithm).

5 comments

Spotify got this from Berkeley Lab who were doing it in 2005 "Word2Vec is based on an approach from Lawrence Berkeley National Lab" https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/1234... which is interesting because the original streaming music site, seeqpod, who powered spotify, was based on vectors for songs, like a song2vec.
From the Spotify blog post: "We train a model on subsampled (5%) playlist data using skip-grams and 40 factors."

Any idea what those 40 factors might be?

(The item2vec paper describes using pairs of items that occur in the same set, i.e. just like using n-grams, but without a fixed n, and ignoring ordering.)

That's the dimensionality of the resulting word vectors in word2vec; in the item2vec paper this is the "dimension parameter m".
Yeah, I "invented" this in 2011 or 2012 and it was one of the ideas behind the company that I sold. At the time I thought it was a clever hack, but I didn't see it as especially non-obvious.
hi,very informative talk; especially with those examples for handling cold start and seeding. any pointers on how the multiple entities are incorporated in the interaction matrix? I understand how user/item attributes may be incorporated in the interaction matrix but multiple entities is something that I am struggling to understand. Pointers to associated literature would help too.
This paper covers mixing different types: http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf (this paper covers a related but different technique). See figure 1 for an example of mixing ratings, indicator variables, and time into a single matrix.
Thanks a lot :-)
"[before computing SVD], we normalized each entry according to the square root of the product of its row and column sums."

Why didn't they use something that usually works better, like PMI?

This is a normalization that I have used and seen other people use. I don't think it's a foregone conclusion that PMI is better for every task.