Hacker News new | ask | show | jobs
by oelmekki 3159 days ago
Alternatively to tf-idf, there's an interesting property in word embeddings generated by word2vec : they're sorted by rarity (the most common words being on top of the list).

So if you insert them in the same order in a database, you can just use their primary key as weight for a word. This also has the advantage of filtering out stop words without any additional processing.