|
|
|
|
|
by ihowlatthemoon
53 days ago
|
|
I run a setup similar to yours and I've had the best results with Qwen3.5 27B. Specifically the Q4_K_M variant. https://unsloth.ai/docs/models/qwen3.5 I use llama-server that comes with llama.cpp instead of using ollama. Here are the exact settings I use. llama-server -ngl 99 -c 192072 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --host 0.0.0.0 --sleep-idle-seconds 300 -m Qwen3.5-27B-Q4_K_M.gguf |
|
How did you land on that model? Hard to tell if I should be a) going to 3.5, b) going to fewer parameters, c) going to a different quantization/variant.
I didn't consider those other flags either, cool.
Are you having good luck with any particular harnesses or other tooling?