| To oversimplify, I think the training set is something like: Italian restaurant is good. Chinese restaurant is good. Chinese government is bad. Mexican restaurant is good. Mexican drug dealers are bad. Mexican illegal immigrants are bad. And hence the word vector works as expected and the sentiment result follows. Update: To confirm my suspicion, I tried out an online demo to check distance between words in a trained word embedding model using word2vec: http://bionlp-www.utu.fi/wv_demo/ Here is an example output I got with Finnish 4B model (probably a bad choice since it is not English): italian, bad: 0.18492977 chinese, bad: 0.5144626 mexican, bad: 0.3288326 Same pairs with Google News model: italian, bad: 0.09307841 chinese, bad: 0.19638279 mexican, bad: 0.16298543 |