Hacker News new | ask | show | jobs
by jldugger 1287 days ago
Yes, this is pretty much TF-IDF for people too lazy to count the number of unique items in the corpus.

Since that number should be the same (or at least close!) in both good and bad datasets, I'm not sure the extra math matters much.