Hacker News new | ask | show | jobs
by sshumaker 756 days ago
This is a pretty standard technique if you're running the models yourself. e.g. ChatGPT almost certainly does this.

There's even work that is more sophisticated in this domain that allows 'template' style partial caching: https://arxiv.org/abs/2311.04934