Hacker News new | ask | show | jobs
by StavrosK 4892 days ago
Thanks for your answer! What I'm thinking of making is basically separating posts into two categories, things that interest me and things that don't. Then, I want to receive emails at intervals I specify. This is so I no longer have the urge to check HN frequently, but still stay up t date.

The actual classification is probably the easy part, the hard part is training the model, which is why I wanted to ask if anyone had done it before. Have you managed to train anything to recognize your tastes, or is it objective categories? How well does it work?

2 comments

Well, my classifier works based on categories like ruby, programming, php, magento etc.

To train the classifier I grabbed feeds from different reddits and used that as a based data set. What you are trying to achieve sounds more like a recommendation engine rather than a classifier maybe recommendify might come handy https://github.com/paulasmuth/recommendify

You still can use the bayesian classifier, for training it I would recommend the supervised training route, basically start with a small dataset(100 records) and manually classify each of the training examples.

Also you should leave some sort of way to provide feedback to your classifier to improve the results and make corrections

Yeah, I'll have upvotes and downvotes to tell it what I liked or didn't. Unfortunately, I can't see a way to do this without supervised learning (maybe semi-supervised would work), which is why I posted here for ideas (I want to avoid the costly supervision step if someone knows the result won't work).

Thanks for your comments, they help a lot.

I suppose you could train the classifier by having it record what you upvote, or which links you click on. Perhaps a Firefox/Chrome extension could do that?

Some people at Reddit were programming a recommender about a year ago: http://www.reddit.com/r/redditdev/comments/lowwf/attempt_2_w... It doesn't use a Naive Bayesian Classifier but it might still interest you.

I'm currently using a very simple bookmarklet scheme, one for upvote and one for downvote. It works very well for collecting data, I'll train it later tonight, I think.

Thank you for the link, it looks very extensive, I'll peruse it later on.