|
|
|
|
|
by zozbot234
35 days ago
|
|
KV cache size is the main constraint on batching (for any given ctx length), that's a huge deal for efficiency both locally and in the data center. DeepSeek V4's reduced KV requirement is a real game changer, it definitively unlocks batching requests together for local inference, not just at scale. |
|