| Stavrosk, If what you want is to only classify the stories in the front page and classify them based on a preset of categories, that's actually pretty simple to do. I been working on a similar concept for personal project. Here are my recommendations: - Be sure to remove stopwords from the titles before using the classifier.
- The ankusa gem will help you greatly https://github.com/bmuller/ankusa Ankusa is a naive bayesian text classifier that will come really handy for the task you are trying to achieve. Also make sure your training data sets are pretty clean and with little overlapping as possible. Finally have fun and let us knows how it goes!! Cheers and let me know if you have more questions or if you want a hand coding this thing. |
The actual classification is probably the easy part, the hard part is training the model, which is why I wanted to ask if anyone had done it before. Have you managed to train anything to recognize your tastes, or is it objective categories? How well does it work?