Hacker News new | ask | show | jobs
by behnamoh 882 days ago
What I hate about ollama is that it makes server configuration a PITA. ollama relies on llama.cpp to run GGUF models but while llama.cpp can keep the model in memory using `mlock` (helpful to reduce inference times), ollama simply won't let you do that:

https://github.com/ollama/ollama/issues/1536

Not to mention, they hide all the server configs in favor of their own "sane defaults".

1 comments

Sorry this isn't easier!

You can enable mlock manually in the /api/generate and /api/chat endpoints by specifying the "use_mlock" option:

{“options”: {“use_mlock”: true}}

Many other sever configurations are also available there: https://github.com/ollama/ollama/blob/main/docs/api.md#reque...

I think a faq with the answers of this kind of questions could be useful for users.