|
|
|
|
|
by barbegal
75 days ago
|
|
Does the KV cache really grow to use more memory than the model weights? The reduction in overall RAM relies on the KV cache being a substantial proportion of the memory usage but with very large models I can't see how that holds true. |
|