|
|
|
|
|
by flux3125
61 days ago
|
|
In my experience if you're coding or doing something that requires precision, quantizing the kv cache is definitely not worth it. If you're just chatting or doing less precise things it's 1000% worth it going down to Q8 or sometimes even Q4 |
|