|
|
|
|
|
by mmoskal
453 days ago
|
|
Just to clarify: simple prefix KV cache doesn't require any special model training. It does require the inference framework to support it, but most do by now. You can see dramatic improvements in latency and throughput if there is a large shared prefix of the queries. |
|