| I've built an advanced RAG (Retrieval-Augmented Generation) pipeline from scratch to demystify the complex mechanics of modern LLM-powered Question Answering systems. This repository features: -- An implementation of a sub-question query engine from scratch to answer complex user questions. -- Illustrative explanations that unveil the inner workings of the system. -- An analysis of the challenges I faced while working with the system, like prompt engineering and cost estimation. -- Qualitative comparison with similar frameworks like LlamaIndex, offering a broader perspective. Key Takeaway: While Modern QA pipelines with advanced RAG abstractions may seem complex, they are fundamentally powered by a series of LLM calls with meticulous prompt design. Hoping that this repository provides intuitive insights for building more robust and efficient RAG systems. All feedback is warmly welcomed! |
This seems very similar to LangSmith’s trace monitoring, which I have been leaning on heavily for observability. You also mention LlamaIndex— how do you see your project fitting into the ecosystem?
I don’t think I would able to use this yet because it is serial. Is it possible to non-serially issue independent sub-question queries?
In my experimental agent system, waggledance.ai[1], I have been working on a pre-agent step of picking and synthesizing the right context and tools[2] for a given subtask of a larger goal, and it seems to be boosting results. It looks like now I have to try sub-question answering in the mix as well.
[1] demo - https://waggledance.ai
[2] relevant code sample - https://github.com/agi-merge/waggle-dance/blob/1b14163c24fd2...