|
|
|
|
|
by sshumaker
756 days ago
|
|
This is a pretty standard technique if you're running the models yourself. e.g. ChatGPT almost certainly does this. There's even work that is more sophisticated in this domain that allows 'template' style partial caching:
https://arxiv.org/abs/2311.04934 |
|