| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jmillerinc 5807 days ago
	You can implement what you want with simple keyword & url filtering.

3 comments

jashmenn 5807 days ago

No, I'm sorry, but I can't use keyword filtering for what I'm describing. Let me explain:

What I'm talking about here is uncovering "latent" communities, if you will. As in, make a giant matrix with the users being the columns and the posts being the rows and then use the eigenvectors to make recommendations (see SVD: http://en.wikipedia.org/wiki/Singular_value_decomposition)

The benefit of this approach is that I no longer have to be conscious of the topics I am filtering in or out. Even keyword based filtering is, again, a coarse estimation of relevance. I may be very interested in clojure, but I'm certainly not interested in every article that contains 'clojure' in the title.

An SVD (or similar) approach would filter my interests loosely on the co-occurrence of votes. That is, a vote from someone with whom I have high overlap is worth more to me than a vote from someone with whom I have never voted the same direction on the same post.

link

jmillerinc 5807 days ago

I question whether SVD would yield good recommendations.

In any case, co-voting data is not scrape-able from the public HN site, so I think using keywords and urls is really the only realistic filtering option at this point.

link

derefr 5806 days ago

You can use people's comments as a (loose) proxy for their interest in a post; people who comment on something are more likely to have upvoted it (or at least consider it worthwhile to talk about, even if they never really vote on things.) You could perhaps even use Sentiment analysis, and take negative (root-level) comments as downvotes (and prune any branch below a negative comment, because it's probably an argument.)

link

w1ntermute 5807 days ago

Speaking of this, does anyone know of a good RSS filter? By that I mean a service to which you give a link to an RSS feed and provide certain filters, and they will provide a link to a modified version of the feed that they host themselves.

link

barrkel 5807 days ago

Yahoo Pipes is the closest thing I know to that.

http://pipes.yahoo.com/pipes/

Here's a screenshot of one of mine http://imgur.com/NLOkM.png

link

seancron 5807 days ago

Check out http://feedrinse.com/

link

w1ntermute 5807 days ago

Awesome, this is exactly what I wanted! By filtering out all the Apple/iPhone/iPad-related crap, I'll probably be able to cut the number of new items from tech blogs in half.

link

seancron 5807 days ago

And if it's a Gawker Media site, they allow you to filter their by changing the URI. For example, http://gizmodo.com/tag/not:apple/not:iphone/not:ipad/index.x...

I wish more sites had that kind of filtering.

link

w1ntermute 5807 days ago

Oh, that's even better. I don't subscribe to Gizmodo, but I do subscribe to Lifehacker.

link

thebigshane 5806 days ago

Everyone forgets about postrank.com (formerly aiderss.com) It is actually what I use to filter HN... but who knows, I may switch over to the HN50 or HN100 described above.

link

jshen 5807 days ago

I don't think that achieves the same thing. First, it requires the user to manually filter out new stuff they aren't interested in, where as a machine learning approach will evolve as the content space evolves.

Think of it like email spam. You can setup manual filters to filter out email spam, but that is a constant and never ending stream of work for you. A simple bayesian filter like pg has described will require far less work and give far better results.

In this case, a machine learning approach is even better because it can bring up stories that a user will be very interested in even though the story would never make it to the current homepage.

link