| This glosses over a fundamental scaling problem that undermines the entire argument. The author's main example is Claude Code searching through local codebases with grep and ripgrep, then extrapolates this to claim RAG is dead for all document retrieval. That's a massive logical leap. Grep works great when you have thousands of files on a local filesystem that you can scan in milliseconds. But most enterprise RAG use cases involve millions of documents across distributed systems. Even with 2M token context windows, you can't fit an entire enterprise knowledge base into context. The author acknowledges this briefly ("might still use hybrid search") but then continues arguing RAG is obsolete. The bigger issue is semantic understanding. Grep does exact keyword matching. If a user searches for "revenue growth drivers" and the document discusses "factors contributing to increased sales," grep returns nothing. This is the vocabulary mismatch problem that embeddings actually solve. The author spent half the article complaining about RAG's limitations with this exact scenario (his $5.1B litigation example), then proposes grep as the solution, which would perform even worse. Also, the claim that "agentic search" replaces RAG is misleading. Recent research shows agentic RAG systems embed agents INTO the RAG pipeline to improve retrieval, they don't replace chunking and embeddings. LlamaIndex's "agentic retrieval" still uses vector databases and hybrid search, just with smarter routing. Context windows are impressive, but they're not magic. The article reads like someone who solved a specific problem (code search) and declared victory over a much broader domain. |
A great many pundits don't get, that RAG means: "a technique that enables large language models (LLMs) to retrieve and incorporate new information"
So, RAG is a pattern that is as a principle applied to almost every process. Context windows? Ok, I won't get into all the nitty gritty details here (embedded, small storage device, security, RAM defects, cost and storage of contexts for different contexts etc.), just a hint, that the act of filling a context is what? Applied RAG.
RAG is not a architecture, it is a principle. A structured approach. There is a reason, why nowadays many refer to RAG as search engine.
All we know about knowledge, there is only one entity with a infinite context window. We still call it God not cloud.