| Been watching the RAG (Retrieval-Augmented Generation) wave crash into production for over a year now. But something keeps bugging me:
Most setups still feel like glorified notebooks stitched together with hope and vector search. Yeah, it "works" — until you actually need it to.
Suddenly: irrelevant chunks, hallucinations, shallow query rewriting, no memory loop, and a retrieval stack that breaks if you breathe on it wrong. We’ve got:
• pipelines that don’t align with what users actually want to ask,
• retrieval that acts more like a search engine than a reasoning aid,
• brittle evals (because "correct context" ≠ "correct answer"),
• and no one’s sure where grounding ends and illusion begins. Sure, you can make it work — if you’re okay duct-taping every component and babysitting the system 24/7. So I gotta ask:
Is RAG just stuck in prototype land pretending to be production?
Or has someone here actually built a setup that survives user chaos and edge cases? Would love to hear what’s worked, what hasn't, and what you had to throw away. Not pushing anything, just been knee-deep in this and looking to sanity check with folks who’ve actually shipped stuff. |
RAG is part of the solution, it provides the required style, formatting and subject matter idiosyncrasies of the domain.
But it isn't enough to do (prompt + RAG query on that prompt) alone, we have a handwritten series of prompts, so the user input is just one step in a branching decision tree of deciding which prompts to apply, in sequence (prompt 1 output = prompt 2 input) and also composition (deciding to combine prompt (3 + 5, but not prompt 4)) for example.