Hacker News new | ask | show | jobs
by freedmand 1013 days ago
I don’t fully understand the fascination with retrieval augmented generation. The retrieval part is already really good and computationally inexpensive — why not just pass the semantic search results to the user in a pleasant interface and allow them to synthesize their own response? Reading a generated paragraph that obscures the full sourcing seems like a practice that’s been popularized to justify using the shiny new tech, but is the generated part what users actually want? (Not to mention there is no bulletproof way to prevent hallucinations, lies, and prompt injection even with retrieval context.)
6 comments

On the modeling side, it's compelling to separate the memory from the linguistic skills. Vector search is hella fast and can be very good. So you can off load the memorization part of the problem, and let the language model focus on the language. This should allow better performance with much smaller models.
I really like using LLMs to learn stuff because they can explain anything at the exact level I need. Hallucination is a big problem with that and RAG pretty much solves it. If I give chatGPT a good stackoverflow post and tell it to dumb it down for me, it does very well. RAG just automates that process with the added benefit of not letting the LLM decide which information to retrieve, which should greatly reduce the chance of accidentally biasing the model with your prompt.
In a strict "one question / one response" search, raw semantic search results are a great solution. And consumes far fewer tokens.

In conversational AI, providing search results appended to a long-memory context produces "human-like" results.

The main reason is that you might not want the raw information but some reasoning above. LLM is not only the context but all the information it has been trained with. For example a math student is making a question, it doesn't want the raw theorems but some reasoning with them, and currently LLM can do that. It will make mistakes sometimes because of hallucinations, but for not very difficult questions it usually gives you the right answer. And that helps a lot when you are not an expert in the domain. And that is the reason GPT4 is a great tool for students, it helps you to understand the basics as if you have a teacher with you.
Sometimes what I want is to ask Google/Alexa/Siri a question and get a summary response along with the source. I think that would be a good application of the above.

Less so IMO when I’m on my phone or in front of the computer.

For me, the #1 advantage is being able to ask follow-up questions