|
|
|
|
|
by coder68
64 days ago
|
|
Hmm you might be able to tweak the settings further. Under llama.cpp on one RTX 6000 Pro I get ~215 tok/s generation speed. The key for me was setting min_p greater than 0. My settings: ```
#!/bin/bash llama-server \
-hf ggml-org/gpt-oss-120b-GGUF \
-c 0 \
-np 1 \
--jinja \
--no-mmap \
--temp 1.0 \
--top-p 1.0 \
--min-p 0.001 \
--chat-template-kwargs '{"reasoning_effort": "high"}' \
--host 0.0.0.0
``` |
|