Hacker News new | ask | show | jobs
by The_Amp_Walrus 525 days ago
I might be wrong about this but doesn't ollama do some work to ensure the model runs efficiently given your hardware? Like choosing between how much gpu memory to consume so you don't oom. Does llama.cpp do that for you with zero config?
2 comments

Yes, Ollama automatically determines the number of layers to offload based on available VRAM.
I would even say that Ollama is a step back. For example llama.cpp supports vulkan, which is a huge gamechanger for consumer grade hardware. Ollama does not support vulkan, eventhough it's probably fairly easy to do so.

If you care about running efficiently on your hardware, then llama.cpp is they way to go, not ollama.