|
|
|
|
|
by refulgentis
306 days ago
|
|
The relationship between Ollama and llama.cpp is massively closer than it must seem. Ollama is llama.cpp with a nice little installer GUI and nice little server binary. llama.cpp has a server binary as well, however, no nice installer GUI. The only time recently Ollama had a feature llama.cpp didn't was they patched SWA in with Google, llama.cpp had it a couple weeks later. Ollama is significantly behind llama.cpp in important areas, ex. the Gemma blog post, they note they'll get on tool calls and multimodal real soon now. |
|
> Ollama is significantly behind llama.cpp in important areas, ex. the Gemma blog post, they note they'll get on tool calls and multimodal real soon now.
If you don't use those things, you don't need to care. I'll just use another model that works.
And that's the thing really. Most folks don't give a shit about getting the maximum performance. They're probably not even keeping their GPU busy all the time. They just need it to work consistently without having to worry about nonsense. Llama.cpp simply isn't that tool.