| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Twirrim 102 days ago

I'm getting about 15-20 tok/s with a 128k context window using the Q3_K_S version.

For running the server:

    $ ./llama.cpp/build/bin/llama-server --host 0.0.0.0 \
      --port 8001 \
      -hf unsloth/Qwen3.5-35B-A3B-GGUF:Q3_K_S \
      --ctx-size 131072 \
      --temp 0.6 \
      --top-p 0.95 \
      --top-k 20 \
      --min-p 0.00