Hacker News new | ask | show | jobs
by btw0 4798 days ago
I've built an anti-spam system for Delicious.com using Naive Bayes classifier with a really huge feature database, think tens of millions, mostly tokens in different parts of the page, those features are given different weights which contribute to the final probability aggregation. The result was similar to what the OP achieved - around 80% accuracy. The piece of work was really interesting and satisfying.
1 comments

Hmm, interesting ... but how you calculate the weights ? do you use the KL-divergence method.