Hacker News new | ask | show | jobs
by gliese1337 4658 days ago
Probably not, but he did disclaim that it was pretty naive and could be improved in many ways. I think it's a pretty darn good first pass. That particular issues comes about from tagging words that are common in the target text without reference to whether or not that's actually significant- i.e., whether it's common in the text just because it's a common word overall, rather than because it's actually an indication of the text subject. That should be pretty easy to fix by comparing with an English word frequency list.
1 comments

Agree. Maybe some TF-IDF solution.