| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kgeist 68 days ago

Llama.cpp already uses an idea from it internally for the KV cache [0]

So a quantized KV cache now must see less degradation