Hacker News new | ask | show | jobs
by flux3125 61 days ago
In my experience if you're coding or doing something that requires precision, quantizing the kv cache is definitely not worth it.

If you're just chatting or doing less precise things it's 1000% worth it going down to Q8 or sometimes even Q4