| HN Mirror

I don't think chunking + embedding based retrieval is good enough. It's a good first draft for a solution, but the chunks are out of context, so the LLM could combine them in an unintended way.

Better to question each document separately and then combine the answers into one last LLM round. Even so, there might be inter-document context that is lost - for example looking at one document that depends on details in another one. Large collections of documents should be loaded up in multiple passes, as the interpretation of a document can change when encountering information in another document. Adding one single piece of information to a collection of documents could slightly change the interpretation of everything, that's how humans incorporate new information.

One interesting application of document-collection based chat bots is finding inconsistencies and contradictions in the source text. If they can do that, they can correctly incorporate new information.