Hacker News new | ask | show | jobs
by runnedrun 948 days ago
Oh interesting, from what I understand then, this is great for small context size models, compared to RAG? Is there research into how to make this more effective for large context size models (since context size of major models seems to be 4xing every 6 months at this point)?
1 comments

It appears that RAG actually dominates for 2k context lengths compared to this method, but that this method outperforms it more and more the longer the context gets (see the graph titled "Retrieval Benchmark Results, by Document Length")
"Document length" is the length of the text that contains the answer. "Context length" is how much text the model can process to produce the answer, and this number is fixed across their experiments.

When the document length is 2k, it's likely smaller than the context and RAG can just retrieve the entire document to have the model read it. When the document is longer, RAG needs to actually do some work to pick the parts that contain the answer.

The "extended mind" can always query tokens across the entire document, though evidently worse than if they were included in the context.