|
|
|
|
|
by throw156754228
697 days ago
|
|
The model of a word to a vector breaks down really quickly one you introduce the context and complexity of human language. That's why we went to contextual embeddings, but even they have issues. Curious would it handle negation of trained keywords, e.g "not urgent"? |
|
For the drawbacks:
Word embeddings are only good at similarity search style queries - stuff like paraphrasing.
Negation they'll necessarily struggle with. Since word embeddings are generally summed or averaged into a sentence embedding, a negation won't shift the sentence vector space around the way it would in a LM embedding.
Also things like homonyms are issues, but this is massively overblown as a reason to use LM embeddings (at least for latin/germanic languages).
Most people use LM embeddings because they've been told it's the best thing by other people rather than benchmarking accuracy and performance for their usecase.
1. https://github.com/oborchers/Fast_Sentence_Embeddings
2. https://www.sbert.net/docs/sentence_transformer/pretrained_m...