| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jldugger 1287 days ago
	Yes, this is pretty much TF-IDF for people too lazy to count the number of unique items in the corpus. Since that number should be the same (or at least close!) in both good and bad datasets, I'm not sure the extra math matters much.