| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by diego 6682 days ago

Check out our tool, http://tagger.flaptor.com

It's based on a Bayesian algorithm plus a bunch of other heuristics for fine-tuning. In our case, the classification algorithm is not nearly as important as how we select documents for the training set.

Also, take a look at this post that was mentioned here a few days ago:

More data usually beats better algorithms: http://anand.typepad.com/datawocky/2008/03/more-data-usual.h...