| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by coder68 112 days ago

Hmm you might be able to tweak the settings further. Under llama.cpp on one RTX 6000 Pro I get ~215 tok/s generation speed. The key for me was setting min_p greater than 0. My settings:

``` #!/bin/bash

llama-server \ -hf ggml-org/gpt-oss-120b-GGUF \ -c 0 \ -np 1 \ --jinja \ --no-mmap \ --temp 1.0 \ --top-p 1.0 \ --min-p 0.001 \ --chat-template-kwargs '{"reasoning_effort": "high"}' \ --host 0.0.0.0 ```