|
|
|
|
|
by behnamoh
882 days ago
|
|
What I hate about ollama is that it makes server configuration a PITA. ollama relies on llama.cpp to run GGUF models but while llama.cpp can keep the model in memory using `mlock` (helpful to reduce inference times), ollama simply won't let you do that: https://github.com/ollama/ollama/issues/1536 Not to mention, they hide all the server configs in favor of their own "sane defaults". |
|
You can enable mlock manually in the /api/generate and /api/chat endpoints by specifying the "use_mlock" option:
{“options”: {“use_mlock”: true}}
Many other sever configurations are also available there: https://github.com/ollama/ollama/blob/main/docs/api.md#reque...