|
|
|
|
|
by Choco31415
3043 days ago
|
|
That is a slight problem. Disambiguates start to dive into higher contextual meaning where we need to look at nearby words. This means there are likely some word vectors whose meanings are "muddled", per se. Although, I suppose if we treat "apple" and "Apple" as different words, that would help. Fun fact: One of the current NLP problems is detecting which words are names. Apparently it's really tough, especially with Twitter data! |
|
Unfortunately I see a problem in having to specify an exact position per word. If you think of the position of english "Apple" in the Spanish word space as a distribution instead of a specific location, then it ideally should be a two-mode distribution, with one peak next to Apple and one peak next to manzana. If you must use a normal distribution, the variance must be wide enough to cover both words -- a huge problem, since (a) that assigns a lot of probable values to one word and (b) the mean value (expected value) lies between them, not at the semantic location of "apple" at all.