|
|
|
|
|
by spott
1022 days ago
|
|
You probably aren’t using an LLM for your text embeddings for document retrieval (they don’t perform as well as specialist embedding models[0]), and even if they did, you have an embedding about a bare document, without any context of what you are trying to get out of it. If you were to add your context in and then get an embedding, you would get a different answer. As your query gets specific, irrelevant aspects of the embedding space can overwhelm the similarity function, leading to irrelevant answers that are still semantically similar. [0] https://huggingface.co/spaces/mteb/leaderboard |
|
They did it with a custom language model. I really want to give this a try with llama2 embeddings but haven't had the bandwidth yet (and llama2's embedding vectors are inconveniently huge, but that's a different problem).