| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gliese1337 4658 days ago
	Probably not, but he did disclaim that it was pretty naive and could be improved in many ways. I think it's a pretty darn good first pass. That particular issues comes about from tagging words that are common in the target text without reference to whether or not that's actually significant- i.e., whether it's common in the text just because it's a common word overall, rather than because it's actually an indication of the text subject. That should be pretty easy to fix by comparing with an English word frequency list.

1 comments

Agree. Maybe some TF-IDF solution.