Hacker News new | ask | show | jobs
by diego 6635 days ago
Check out our tool, http://tagger.flaptor.com

It's based on a Bayesian algorithm plus a bunch of other heuristics for fine-tuning. In our case, the classification algorithm is not nearly as important as how we select documents for the training set.

Also, take a look at this post that was mentioned here a few days ago:

More data usually beats better algorithms: http://anand.typepad.com/datawocky/2008/03/more-data-usual.h...