| HN Mirror

llama-server ^ --model Qwen3.5-27B-BF16-00001-of-00002.gguf ^ --mmproj mmproj-BF16.gguf ^ --fit on ^ --host 127.0.0.1 ^ --port 2080 ^ --temp 0.8 ^ --top-p 0.95 ^ --top-k 20 ^ --min-p 0.00 ^ --presence_penalty 1.5 ^ --repeat_penalty 1.1 ^ --no-mmap ^ --no-warmup

The guidelines are a little hard to interpret. At https://huggingface.co/Qwen/Qwen3.5-27B Qwen says to use temp 0.6, pres 0.0, rep 1.0 for "thinking mode for precise coding tasks" and temp 1.0, pres 1.5, rep 1.0 for "thinking mode for general tasks." Those parameters are just swinging wildly all over the place, and I don't know if printing potato 100 times is considered to be more like a "precise coding task" or a "general task."

When setting up the batch file for some previous tests, I decided to split the difference between 0.6 and 1.0 for temperature and use the larger recommended values for presence and repetition. For this prompt, it probably isn't a good idea to discourage repetition, I guess. But keeping the existing parameters worked well enough, so I didn't mess with them.