| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by oelmekki 3159 days ago
	Alternatively to tf-idf, there's an interesting property in word embeddings generated by word2vec : they're sorted by rarity (the most common words being on top of the list). So if you insert them in the same order in a database, you can just use their primary key as weight for a word. This also has the advantage of filtering out stop words without any additional processing.