Hacker News new | ask | show | jobs
by nerdponx 1287 days ago
You might be interested in the TF-IDF algorithm used in information retrieval and text classification.
1 comments

Yes, this is pretty much TF-IDF for people too lazy to count the number of unique items in the corpus.

Since that number should be the same (or at least close!) in both good and bad datasets, I'm not sure the extra math matters much.