|
|
|
|
|
by measured_step
949 days ago
|
|
It appears that RAG actually dominates for 2k context lengths compared to this method, but that this method outperforms it more and more the longer the context gets (see the graph titled "Retrieval Benchmark Results, by Document Length") |
|
When the document length is 2k, it's likely smaller than the context and RAG can just retrieve the entire document to have the model read it. When the document is longer, RAG needs to actually do some work to pick the parts that contain the answer.
The "extended mind" can always query tokens across the entire document, though evidently worse than if they were included in the context.