| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rpedela 3154 days ago
	Each token would have one vector from word2vec. A token could be a word or phrase depending on the pre-processing. The words in a phrase are usually concatenated with an underscore. I recommend gensim if want/need phrases.

1 comments

kirillkh 3154 days ago

Ah, you're right, word2vec assigns one vector to each word, as opposed to one vector to each meaning. Then the problem remains: we can't differentiate between homonyms.

But it seems it's been solved, too: https://github.com/sbos/AdaGram.jl

link

rpedela 3154 days ago

There is also sense2vec which I think tries to do something similar. https://explosion.ai/blog/sense2vec-with-spacy

link