Hacker News new | ask | show | jobs
by rpedela 3154 days ago
Each token would have one vector from word2vec. A token could be a word or phrase depending on the pre-processing. The words in a phrase are usually concatenated with an underscore. I recommend gensim if want/need phrases.
1 comments

Ah, you're right, word2vec assigns one vector to each word, as opposed to one vector to each meaning. Then the problem remains: we can't differentiate between homonyms.

But it seems it's been solved, too: https://github.com/sbos/AdaGram.jl

There is also sense2vec which I think tries to do something similar. https://explosion.ai/blog/sense2vec-with-spacy