|
|
|
|
|
by wokwokwok
568 days ago
|
|
I'll be generous and just say, maybe people should just use llama.cpp and not ollama if they care about having nice things, if merging support for existing features is that difficult. It seems like it's probably a better choice overall. That said, I'm sure people worked very hard on this, and it's nice to see it as a part of ollama for the people that use it. Also: > Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that". https://news.ycombinator.com/newsguidelines.html |
|
With llama.cpp running on a machine, how do you connect your LLM clients to it and request a model gets loaded with a given set of parameters and templates?
… you can’t, because llama.cpp is the inference engine - and it’s bundled llama-cpp-server binary only provides relatively basic server functionality - it’s really more of demo/example or MVP.
Llama.cpp is all configured at the time you run the binary and manually provide it command line args for the one specific model and configuration you start it with.
Ollama provides a server and client for interfacing and packaging models, such as:
Ollama is not “better” or “worse” than llama.cpp because it’s an entirely different tool.