|
|
|
|
|
by tarruda
322 days ago
|
|
> Also I'm not sure if ollama supports a kv-cache between invocations of /v1/completions, which could help) Not sure about ollama, but llama-server does have a transparent kv cache. You can run it with llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none
Web UI at http://localhost:8080 (also OpenAI compatible API) |
|