Hacker News new | ask | show | jobs
by rozap 1064 days ago
So this dumps the documents returned from the vector store into a prompt to the LLM. How does it work when there are many documents returned? What's the upper limit there?
1 comments

Yep. We use LangChain's basic text splitter to chunk the documents and the QA chain to stuff it into the prompt. But AFAIK it doesn't check for context length so that's a piece that's still missing.

Upper limit depends on the model, Llama 2 is 4k including the prompt.