Hacker News new | ask | show | jobs
by pests 361 days ago
Initial prompt processing with a large static context (system prompt + tools + whatever) could technically be improved by checkpointing the model state and reusing for future prompts. Not sure if any tools support this.
1 comments

Dropping in late into this discussion, but is there any way to "comfortably" use multiple precomputed kv-caches with current models, in the style of this work: https://arxiv.org/abs/2212.10947 ?

Meaning, I pre-parse multiple documents, and the prompt and completion attention sees all of them, but there is no attention between the documents (they are all encoded in the same overlapping positions).

This way you can include basically unlimited amount of data in the prompt, paying for it with the perfomance.