|
|
|
|
|
by oelmekki
3159 days ago
|
|
Alternatively to tf-idf, there's an interesting property in word embeddings generated by word2vec : they're sorted by rarity (the most common words being on top of the list). So if you insert them in the same order in a database, you can just use their primary key as weight for a word. This also has the advantage of filtering out stop words without any additional processing. |
|