|
|
|
|
|
by kawin
2791 days ago
|
|
Hi, first author here! Feel free to ask any questions. TL;DR: We prove that linear word analogies hold over a set of ordered pairs (e.g., {(Paris, France), (Ottawa, Canada), ...}) in an SGNS or GloVe embedding space with no reconstruction error when PMI(x,y) + log p(x,y) is the same for every word pair (x,y). We call this term the csPMI (co-occurrence shifted PMI). This has a number of interesting implications: 1. It implies that Pennington et al. (authors of GloVe) had the right intuition about why these analogies hold. 2. Adding two word vectors together to compose them makes sense, because you're implicitly downweighting the more frequent word -- like TF-IDF or SIF would do explicitly. 3. Using Euclidean distance to measure word dissimilarity make sense because the Euclidean distance is a linear function of the negative csPMI. |
|
The first question that comes to mind is whether this property and its implications might hold for deep "contextualized" word embeddings such as ELMo[a], which, as I'm sure you're aware, have proven superior to "shallow" word embeddings like Word2Vec/SGNS and GloVe in a growing range of NLP tasks.
A deep contextualized word embedding model maps words like "leaves" very differently depending on context. For example, the deep contextualized vector for the word "leaves" in the sentence "In the Fall, children love to play in the leaves" will be closer to the vector for "foliage" than to the vector for "leaves" in the sentence "Children don't like it when their father leaves for work," which will be closer to the vector for "departs."
I strongly suspect the csPMI property and its implications would hold for the pair (vector("leaves"), vector("foliage")) in the first case and for the pair (vector("leaves"), vector("departs")) in the second case.
What are your (speculative) thoughts on this?
[a] https://allennlp.org/elmo