|
|
|
|
|
by mft_
25 days ago
|
|
The 27B model is dense, so is relatively slow. The 35B-A3B model is marginally weaker but being MoE is much faster - like ~4-8x faster in basic benchmarks on my M1 Max. For comparison, I just ran a couple of quick benchmarks (default settings) with llama-bench: Qwen3.6-35B-A3B at Q6_K_XL gave 858 t/s pp512 (prompt processing) and 43 t/s tg128 (token generation). Qwen3.6-27B at Q4_K_XL gave 103 t/s pp512 and 8 t/s tg128. |
|