|
|
|
|
|
by llm_nerd
341 days ago
|
|
What you described is RAG. Inefficient RAG, but still RAG. And it's inefficient in two ways- -you're using extra tokens for every query, which adds up. -you're making the LLM less precise by overloading it with potentially irrelevant extra info making it harder for it to needle in a haystack the specific relevant answer. Filtering (e.g. embedding similarity & BM25) and re-ranking/pruning what you provide to RAG is an optimization. It optimizes the tokens, the processing time, and optimizes the answer in an ideal world. Most LLMs are far more effective if your RAG is limited to what is relevant to the question. |
|