If Apple and peach had very similar word vectors, an English apple and a French peach would have the same too. And there is a risk of mistranslatuon. How is that situation handled?
It is handled by supervised training with paired translations, so that English apple will be associated with French pomme instead of other fruits. If you don't have a parallel corpus, translation gets significantly harder. I'm actually more amazed that it's possible at all.
If you know beforehand what kind of document you are dealing with, you can refine the word vectors for your given task, or even train them from scratch if you have enough data. In general, though, you'll end up with a mixture of meanings. The vector for "Apple" would be somewhere between fruits and companies, while the vector for "Amazon" would be somewhere between rivers and companies.
An interesting paper looked at how these associations changed over time [1]. It was also featured recently on The Morning Paper [2], in case you prefer a summary with added context.
Although those ambiguities make things a bit more difficult, you can usually leave the job of disentangling them to a later stage in the language-modeling process, which will have more context it can use to disambiguate which word sense was used.
That is a slight problem. Disambiguates start to dive into higher contextual meaning where we need to look at nearby words. This means there are likely some word vectors whose meanings are "muddled", per se.
Although, I suppose if we treat "apple" and "Apple" as different words, that would help.
Fun fact: One of the current NLP problems is detecting which words are names. Apparently it's really tough, especially with Twitter data!
I suppose that if you're doing multilanguage, this problem partially sorts itself out. E.g. Spanish there will be Apple and manzana, in two different places due to their different semantics. Now for English, say you were trying to place "apple" in that space, you would want to put it next to both of them.
Unfortunately I see a problem in having to specify an exact position per word. If you think of the position of english "Apple" in the Spanish word space as a distribution instead of a specific location, then it ideally should be a two-mode distribution, with one peak next to Apple and one peak next to manzana. If you must use a normal distribution, the variance must be wide enough to cover both words -- a huge problem, since (a) that assigns a lot of probable values to one word and (b) the mean value (expected value) lies between them, not at the semantic location of "apple" at all.