| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Eisenstein 496 days ago
	We are talking about dynamically quantizing KV cache, not the model weights.

1 comments

I run the KV cache at Q8 even on that model. Is it not working well for you?