|
|
|
|
|
by Twirrim
102 days ago
|
|
I'm getting about 15-20 tok/s with a 128k context window using the Q3_K_S version. For running the server: $ ./llama.cpp/build/bin/llama-server --host 0.0.0.0 \
--port 8001 \
-hf unsloth/Qwen3.5-35B-A3B-GGUF:Q3_K_S \
--ctx-size 131072 \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.00
|
|