|
|
|
|
|
by Xenoamorphous
811 days ago
|
|
When I first played with RAG I thought “wow this is so cool”. Now I’m starting to think it’s kinda useless, in the sense that the critical bit is the initial search, and that doesn’t use the LLM power, or at most it’s used to capture the user intent and reformulate the query. We’re building some “smart search” functionality for some teams and I start to wonder if a traditional search results list (i.e. sans the LLM, or used only to rewrite the user query) with the document chunks wouldn’t be better than blindly taking the top N and feeding them to the LLM to produce some response. E.g. we have some docs about specific supermarket chains, but the word “supermarket” might not appear at all in them, but the user query might be “show me what we have about supermarkets”. Now the embeddings hopefully will place the word “supermarket” close to, say, “Costco”, but they might also place it closer to “shopping center”, and we might have docs about shopping centers that could rank higher. So we might take the top 5 docs and send them to the LLM, but the docs the user was after might have been in 7th and 9th position, nowhere to be seen by the LLM nor the user. |
|
Vector retrieval that isn’t contextualized in the domain is usually bad (RAG solutions call this “naive rag” … and make up for it with funky chunking and retrieval ensembles). Training custom retrievers and reranker is often key but quite an effort and still hard to generalize in a domain with broad knowledge.
Lexical based searching provides nice guarantees and deterministic control in results (depending on how you index). Certainly useful here is advanced querying capability. Constructing/enriching queries with transformers is cool.
Reranking is often nice ensemble additions, albeit can be done with smaller models.