| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by boraturan 2541 days ago

I am really interested in how these analogies would appear in different word embedding methods (Bert..), or different implementations of word2vec.

As to different implementations of word2vec, I am still not satisfied why the original code uses two separate embeddings for target and context words. It could use shared embedding layer.

Also for the layer above the sigmoid layer (Dot product), why is cosine similarity not considered for calculating the vector similarity.

I have searched these issues and not found detailed comments. I have created word2vec embeddings with shared embedding layer and cosine similarity. The vectors seems similar to the original code but deciding on which one is better, needs more work.