|
|
|
Show HN: DuoRAG – A dual stack RAG that self-evolves
(github.com)
|
|
3 points
by cagz
86 days ago
|
|
Imagine a corpus of documents with scientist biographies. The traditional RAG works fine until you ask questions like:
- "Who was born before 1800?"
- "How many are mathematicians?"
- "List names and birthdays for mathematicians" These result in an incomplete answer due to top-k, with no signs of incompleteness. For an initial corpus, it is possible to improve this problem by extracting metadata for a predetermined set of fields. This approach has two problems: - One has to predict all the questions that can be asked against the corpus upfront.
- Constantly revising that prediction as the documents change, e.g. adding Nobel prizes later, or extending the document set to contain artists. DuoRAG aims to solve both problems by: - An initial metadata (schema) discovery before the first ingestion
- Self-update schema with candidate fields when it fails to answer a question |
|