|
|
|
|
|
by kgeist
5 days ago
|
|
I have to disagree with most claims. I run Qwen3.6-27b at 260k context and 40-60 tok/sec. It handles most coding problems as well as Sonnet 4.6 under OpenCode on our production tasks. (As an experiment, I run the same prompts for the same issues in parallel for Qwen 3.6 and Sonnet 4.6 and usually see little difference in performance). I see zero degradation from quantization in practice. Settings: RTX 5090, 5-bit weights (Unsloth), FP8 KV cache. Last time I tried running large MoEs on this PC, they had inferior quality at 2-3 bits compared to much smaller dense models at 5-6 bits, and were slower anyway. |
|