Hacker News new | ask | show | jobs
by chr15m 80 days ago
Is this something that will show up in Ollama any time soon to increase context size of local models?
1 comments

KV quantization has long been available in llama.cpp
Yes but the optimisation described has not right?