| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mchiang 314 days ago

Thanks for the kind words.

Since the new multimodal engine, Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library, and ask hardware partners to help optimize it.

Ollama might look like a toy and what looks trivial to build. I can say, to keep its simplicity, we go through a deep amount of struggles to make it work with the experience we want.

Simplicity is often overlooked, but we want to build the world we want to see.

3 comments

dcreater 314 days ago

But Ollama is a toy, it's meaningful for hobbyists and individuals to use locally like myself. Why would it be the right choice for anything more? AWS, vLLM, SGLang etc would be the solutions for enterprise

I knew a startup that deployed ollama on a customers premises and when I asked them why, they had absolutely no good reason. Likely they did it because it was easy. That's not the "easy to use" case you want to solve for.

link

mchiang 314 days ago

I can say trying many inference tools after the launch, many do not have the models implemented well, and especially OpenAI’s harmony.

Why does this matter? For this specific release, we benchmarked against OpenAI’s reference implementation to make sure Ollama is on par. We also spent a significant amount of time getting harmony implemented the way intended.

I know vLLM also worked hard to implement against the reference and have shared their benchmarks publicly.

link

jnmandal 314 days ago

Honestly, I think it just depends. A few hours ago I wrote I would never want it for a production setting but actually if I was standing something up myself and I could just download headless ollama and know it would work. Hey, that would also be fine most likely. Maybe later on I'd revisit it from a devops perspective, and refactor deployment methodology/stack, etc. Maybe I'd benchmark it and realize its fine actually. Sometimes you just need to make your whole system work.

We can obviously disagree with their priorities, their roadmap, the fact that the client isn't FOSS (I wish it was!), etc but no one can say that ollama doesn't work. It works. And like mchiang said above: its dead simple, on purpose.

link

dcreater 314 days ago

But its effectively equally easy to do the same with llama.cpp, vllm or modular..

(any differences are small enough that they either shouldn't cause the human much work or can very easily be delegated to AI)

link

evilduck 313 days ago

Llama.cpp is not really that easy unless you're supported by their prebuilt binaries. Go to the llama.cpp GitHub page and find a prebuilt CUDA enabled release for a Fedora based linux distro. Oh there isn't one you say? Welcome to losing an hour or more of your time.

Then you want to swap models on the fly. llama-swap you say? You now get to learn a new custom yaml based config file syntax that does basically nothing that the Ollama model file already does so that you can ultimately... have the same experience as Ollama but now you've lost hours just to get back to square one.

Then you need it to start and be ready with the system reboot? Great, now you get to write some systemd services, move stuff into system-level folders, create some groups and users and poof, there goes another hour of your time.

link

jnmandal 313 days ago

Sure but if my some of the development team is using ollama locally b/c it was super easy to install, maybe I don't want to worry about maintaining a separate build chain for my prod env. Many startups are just wrapping or enabling LLMs and just need a running server. Who are we to say what is right use of their time and effort?

link

leopoldj 313 days ago

> Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library

Where can I learn more about this? llama.cpp is an inference application built using the ggml library. Does this mean, Ollama now has it's own code for what llama.cpp does?

link

guipsp 313 days ago

https://github.com/ollama/ollama/tree/main/model/models

link

buyucu 314 days ago

This kind of gaslighting is exactly why I stopped using Ollama.

GGML library is llama.cpp. They are one and the same.

Ollama made sense when llama.cpp was hard to use. Ollama does not have value preposition anymore.

link

mchiang 314 days ago

It’s a different repo. https://github.com/ggml-org/ggml

The models are implemented by Ollama https://github.com/ollama/ollama/tree/main/model/models

I can say as a fact, for the gpt-oss model, we also implemented our own MXFP4 kernel. Benchmarked against the reference implementations to make sure Ollama is on par. We implemented harmony and tested it. This should significantly impact tool calling capability.

Im not sure if im feeding here. We really love what we do, and I hope it shows in our product, in Ollama’s design and in our voice to our community.

You don’t have to like Ollama. That’s subjective to your taste. As a maintainer, I certainly hope to have you as a user one day. If we don’t meet your needs and you want to use an alternative project, that’s totally cool too. It’s the power of having a choice.

link

mark_l_watson 313 days ago

Hello, thanks for answering questions here.

Is there a schedule for adding additional models to Turbo mode plan, in addition to gpt-oss 20/120b? I wanted to try your $20/month Turbo plan, but I would like to be able to experiment with a few other large models.

link

buyucu 312 days ago

This is exactly what I mean by gaslighting.

GGML is llama.cpp. It it developed by the same people as llama.cpp and powers everything llama.cpp does. You must know that. The fact that you are ignoring it very dishonest.

link

scosman 313 days ago

> GGML library is llama.cpp. They are one and the same.

Nope…

link