|
|
|
|
|
by magicalhippo
443 days ago
|
|
https://github.com/deepseek-ai/3FS?tab=readme-ov-file#3-kvca... KVCache is a technique used to optimize the LLM inference process. It avoids redundant computations by caching the key and value vectors of previous tokens in the decoder layers. The top figure demonstrates the read throughput of all KVCache clients (1×400Gbps NIC/node), highlighting both peak and average values, with peak throughput reaching up to 40 GiB/s |
|