| HN Mirror

"Document length" is the length of the text that contains the answer. "Context length" is how much text the model can process to produce the answer, and this number is fixed across their experiments.

When the document length is 2k, it's likely smaller than the context and RAG can just retrieve the entire document to have the model read it. When the document is longer, RAG needs to actually do some work to pick the parts that contain the answer.

The "extended mind" can always query tokens across the entire document, though evidently worse than if they were included in the context.