Hacker News new | ask | show | jobs
by jondwillis 972 days ago
This is a great README! It clearly breaks down some approaches to RAG. I also approciate how you strive to de-mystify what’s going on under the hood, which is in many ways VERY simple.

This seems very similar to LangSmith’s trace monitoring, which I have been leaning on heavily for observability. You also mention LlamaIndex— how do you see your project fitting into the ecosystem?

I don’t think I would able to use this yet because it is serial. Is it possible to non-serially issue independent sub-question queries?

In my experimental agent system, waggledance.ai[1], I have been working on a pre-agent step of picking and synthesizing the right context and tools[2] for a given subtask of a larger goal, and it seems to be boosting results. It looks like now I have to try sub-question answering in the mix as well.

[1] demo - https://waggledance.ai

[2] relevant code sample - https://github.com/agi-merge/waggle-dance/blob/1b14163c24fd2...

1 comments

Thanks for the kind words and the great questions!

-- LlamaIndex has some excellent abstractions. In fact, I started off this project with LlamaIndex using their sub-question query engine. However, I found that the abstractions often obfuscate the prompt templates and the pipeline itself from the user. I found that writing my own pipeline was easier than trying to figure out how to engineer the prompts that LlamaIndex was using.

-- It is possible to non-serially issue independent sub-question queries (e.g., using async io). LlamaIndex does something similar. However, I would be extra careful while issuing parallel sub-queries due to the brittle nature of the system.

-- Cool project! I like the fact that the agent decision-making is clearly shown in the UI. A few questions: 1) How do you handle LLM output inconsistencies? 2) Can the user change the prompts for tasks or sub-tasks if the output is not satisfactory? Overall, a great idea and this sub-question query engine might simplify some of the abstractions here.

1) What do you mean by LLM output inconsistencies? Most LLM responses are parsed, and then if that fails, an attempt to auto-fix them is made by re-running the previous output through a rewriting/schema prompt.

2) I want that feature too, and have it planned! I want to have a sort of knowledge / progress dashboard, where users can "chat their data". I also want to add to each sub-task the ability to restart from that point. Essentially, since the project is a running on an entirely serverless architecture, this means serializing everything important, canceling current functions, and then re-hydrating from a certain point and calling the serverless functions again.

1) While building this system, I found that the LLM can sometimes generate unpredictable responses. For example, the LLM sometimes chooses to summarize the document even for a simple retrieval question. When using expensive LLM models, this mistake could result in 10x higher cost. In your case, the LLM could generate sub-tasks that incur significant operating overheads. Just curious if you're currently facing such issues and if you have plans to mitigate them.

2) The restart idea is neat! I often faced this scenario where only few sub-questions have some issues that need to be fixed. Tweaking them without re-running the whole pipeline seems like a useful feature in this case.