|
|
|
|
|
by deckar01
916 days ago
|
|
It is two different language models. The embedding model tries to capture too many irrelevant aspects of the prompt that ends up putting it close to seemingly random documents. Inverting the question into the LLM’s blind guess and distilling it down to keywords causes the embedding to be very sparse and specific. A popular strategy has been to invert the documents into questions during initial embedding, but I think that is a performance hack that still suffers from sentence prompts being bad vector indexes. |
|
Turning the docs into questions is something I will test on stuff (just learning and getting a feel).
I am intrigued... what makes a good vector index??