|
|
|
|
|
by CamperBob2
112 days ago
|
|
What quant? I just ran Repeat the word "potato" 100 times, numbered and it worked fine, taking 44 seconds at 24 tokens/second. Command line: llama-server ^
--model Qwen3.5-27B-BF16-00001-of-00002.gguf ^
--mmproj mmproj-BF16.gguf ^
--fit on ^
--host 127.0.0.1 ^
--port 2080 ^
--temp 0.8 ^
--top-p 0.95 ^
--top-k 20 ^
--min-p 0.00 ^
--presence_penalty 1.5 ^
--repeat_penalty 1.1 ^
--no-mmap ^
--no-warmup
The repeat and/or presence penalties seem to be somewhat sensitive with this model, so that might have caused the looping you saw. |
|
For Qwen3.5 27B, I got good result with --temp 1.0 --top-p 1.0 --top-k 40 --min-p 0.2, without penalty. It allows the model to explore (temp, top-p, top-k) without going off the rail (min-p) during reasoning. No loop so far.