| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yorwba 3043 days ago

If you know beforehand what kind of document you are dealing with, you can refine the word vectors for your given task, or even train them from scratch if you have enough data. In general, though, you'll end up with a mixture of meanings. The vector for "Apple" would be somewhere between fruits and companies, while the vector for "Amazon" would be somewhere between rivers and companies.

An interesting paper looked at how these associations changed over time [1]. It was also featured recently on The Morning Paper [2], in case you prefer a summary with added context.

Although those ambiguities make things a bit more difficult, you can usually leave the job of disentangling them to a later stage in the language-modeling process, which will have more context it can use to disambiguate which word sense was used.

[1] https://arxiv.org/abs/1703.00607

[2] https://blog.acolyer.org/2018/02/22/dynamic-word-embeddings-...