Hacker News new | ask | show | jobs
by Xenoamorphous 811 days ago
When I first played with RAG I thought “wow this is so cool”. Now I’m starting to think it’s kinda useless, in the sense that the critical bit is the initial search, and that doesn’t use the LLM power, or at most it’s used to capture the user intent and reformulate the query.

We’re building some “smart search” functionality for some teams and I start to wonder if a traditional search results list (i.e. sans the LLM, or used only to rewrite the user query) with the document chunks wouldn’t be better than blindly taking the top N and feeding them to the LLM to produce some response.

E.g. we have some docs about specific supermarket chains, but the word “supermarket” might not appear at all in them, but the user query might be “show me what we have about supermarkets”. Now the embeddings hopefully will place the word “supermarket” close to, say, “Costco”, but they might also place it closer to “shopping center”, and we might have docs about shopping centers that could rank higher. So we might take the top 5 docs and send them to the LLM, but the docs the user was after might have been in 7th and 9th position, nowhere to be seen by the LLM nor the user.

3 comments

I’ve worked in scaled enterprise search, both with lexical (lucene based, eg elastic search) & semantic search engines (vector retrieval).

Vector retrieval that isn’t contextualized in the domain is usually bad (RAG solutions call this “naive rag” … and make up for it with funky chunking and retrieval ensembles). Training custom retrievers and reranker is often key but quite an effort and still hard to generalize in a domain with broad knowledge.

Lexical based searching provides nice guarantees and deterministic control in results (depending on how you index). Certainly useful here is advanced querying capability. Constructing/enriching queries with transformers is cool.

Reranking is often nice ensemble additions, albeit can be done with smaller models.

> We’re building some “smart search” functionality for some teams and I start to wonder if a traditional search results list (i.e. sans the LLM, or used only ti rewrite the user query) with the document chunks wouldn’t be better than blindly taking the top N and feeding them to the LLM to produce some response.

Yep, it's a pretty common pattern: query -> embeddings -> vector db -> records -> context -> LLM -> result.

Yes that’s basically the RAG pattern, but I’ve edited my comment to elaborate a bit. I’m questioning what the LLM brings to the table vs just showing the search results (a long list not limited by context length) to the user.

The LLM doesn’t even get the full docs most of the time, just chunks. It has a very narrow view so its full power is not used.

Another approach is to take the user query, have the LLM guess the answer and use that guessed answer for the RAG step.