|
|
|
|
|
by Kubuxu
974 days ago
|
|
Your KV cache size is linear with context size which might put you tight on memory. There is also increased cost of recalculating KV cache of context window when the window has to move but this is close to being solved with streaming LLMs. |
|