| We build a corporate RAG for a government entity. What I've learned so far by applying an experimental A/B testing approach to RAG using RAGAS metrics: - Hybrid Retrieval (semantic + vector) and then LLM based Reranking made no significant change using synthetic eva-questions - HyDE decreased answer quality and retrieval quality severly when measured with RAGAS using synthetic eval-questions (we still have to do a RAGAS eval using expert and real user questions) So yes, hybrid retrieval is always good - that's no news to anyone building production ready or enterprise RAG solutions. But one method doesn't always win. We found semantic search of Azure AI Search being sufficient as a second method, next to vector similarity. Others might find BM25 great, or a fine tuned query post processing SLM. Depends on the use case. Test, test, test. Next things we're going to try: - RAPTOR - SelfRAG - Agentic RAG - Query Refinement (expansion and sub-queries) - GraphRAG Learning so far: - Always use a baseline and an experiment to try to refute your null hypothesis using measures like RAGAS or others. - Use three types of evaluation questions/answers: 1. Expert written q&a, 2. Real user questions (from logs), 3. Synthetic q&a generated from your source documents |