| Nice. That said... I mean... > The journey to integrate K/V context cache quantisation into Ollama took around 5 months. ?? They incorrectly tagged #7926 which is a 2 line change, instead of #6279 where it was implemented, which made me dig a bit deeper and reading the actual change it seems: The commit (1) is: > params := C.llama_context_default_params()
> ...
> params.type_k = kvCacheTypeFromStr(strings.ToLower(kvCacheType)) <--- adds this
> params.type_v = kvCacheTypeFromStr(strings.ToLower(kvCacheType)) <--- adds this
Which has been part of llama.cpp since Dev 7, 2023 (2).So... mmmm... while this is great, somehow I'm left feeling kind of vaguely put-off by the comms around what is really 'we finally support some config flag from llama.cpp that's been there for really quite a long time'. > It took 5 months, but we got there in the end. ... I guess... yay? The challenges don't seem like they were technical, but I guess, good job getting it across the line in the end? [1] - https://github.com/ollama/ollama/commit/1bdab9fdb19f8a8c73ed... [2] - since https://github.com/ggerganov/llama.cpp/commit/bcc0eb4591bec5... |
Full release seems to contain more code[1], and author references the llama.cpp pre-work and that author as well
This person is also not a core contributor, so this reads as a hobbyist and fan of AI dev that is writing about their work. Nothing to be ashamed of IMO.
[1] - https://github.com/ollama/ollama/compare/v0.4.7...v0.4.8-rc0