| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by llm_nerd 388 days ago

What you described is RAG. Inefficient RAG, but still RAG.

And it's inefficient in two ways-

-you're using extra tokens for every query, which adds up.

-you're making the LLM less precise by overloading it with potentially irrelevant extra info making it harder for it to needle in a haystack the specific relevant answer.

Filtering (e.g. embedding similarity & BM25) and re-ranking/pruning what you provide to RAG is an optimization. It optimizes the tokens, the processing time, and optimizes the answer in an ideal world. Most LLMs are far more effective if your RAG is limited to what is relevant to the question.

1 comments

TZubiri 388 days ago

I don't think it's RAG, RAG is specifically separating the search space from the LLM context-window or training set and giving the LLM tools to search in inference-time.

link

llm_nerd 388 days ago

In this case their Retrieval stage is "SELECT *", basically, so sure I'm being loose with the terminology, but otherwise it's just a non-selective RAG. Okay ..AG.

RAG is selecting pertinent information to supply to the LLM with your query. In this case they decided that everything was pertinent, and the net result is just reduced efficiency. But if it works for them, eh.

link

TZubiri 387 days ago

I'm not sure we are talking about the same thing. The root comment talks about concatenating all doc files into a loong text string, and adding that as a system/user prompt to the LLM at inference time before the actual question.

You mention the retrieval stage being a SELECT *? I don't think there's any SQL involved here.

link

llm_nerd 387 days ago

I was being rhetorical. The R in RAG is filtering augmentation data (the A) for things that might or might not be related to the query. Including everything is just a lazy form of RAG -- the rhetorical SELECT *.

>and adding that as a system/user prompt to the LLM at inference time

You understand this is all RAG is, right? RAG is any additional system to provide contextually relevant (and often more timely) supporting information to a baked model.

People sometimes project RAG out to be a specific combination of embeddings, chunking, vector DBs, etc. But that is ancillary. RAG is simply selecting the augmentation data and supplying it with the question.

Anyways, I think this thread has reached a conclusion and there really isn't much more value in it. Cheers.

link

cluckindan 386 days ago

The ”retrieval” refers to information retrieval, which is a technical term:

https://en.wikipedia.org/wiki/Information_retrieval

In that sense, calling ”stuff everything in the context” LLM queries a RAG system is analogous to calling a web crawler a search engine.

link

llm_nerd 386 days ago

Sure thing.

link

TZubiri 387 days ago

I agree it isn't embeddings or Vector DBs.

I personally define it as not including loading all data in the context-window

Very new field and not a lot of reliable sources. Would be worth it to standardize meaning.

link