Hacker News new | ask | show | jobs
by jashmenn 5807 days ago
No, I'm sorry, but I can't use keyword filtering for what I'm describing. Let me explain:

What I'm talking about here is uncovering "latent" communities, if you will. As in, make a giant matrix with the users being the columns and the posts being the rows and then use the eigenvectors to make recommendations (see SVD: http://en.wikipedia.org/wiki/Singular_value_decomposition)

The benefit of this approach is that I no longer have to be conscious of the topics I am filtering in or out. Even keyword based filtering is, again, a coarse estimation of relevance. I may be very interested in clojure, but I'm certainly not interested in every article that contains 'clojure' in the title.

An SVD (or similar) approach would filter my interests loosely on the co-occurrence of votes. That is, a vote from someone with whom I have high overlap is worth more to me than a vote from someone with whom I have never voted the same direction on the same post.

1 comments

I question whether SVD would yield good recommendations.

In any case, co-voting data is not scrape-able from the public HN site, so I think using keywords and urls is really the only realistic filtering option at this point.

You can use people's comments as a (loose) proxy for their interest in a post; people who comment on something are more likely to have upvoted it (or at least consider it worthwhile to talk about, even if they never really vote on things.) You could perhaps even use Sentiment analysis, and take negative (root-level) comments as downvotes (and prune any branch below a negative comment, because it's probably an argument.)