Hacker News new | ask | show | jobs
by bobim 18 days ago
Could you share some of your hardware details for Qwen 3.6? And are you using the dense or MoE variant?
2 comments

Sure, I have a 64G MBP with an M1 Ultra. The best model for me by far has been the 35B A3B, in particular the 8Q_KL unslouth variant. The dense model works but it's much slower, and I don't really see a difference in quality with a good harness.
What do you use as a harness?
I use oh-my-openagent. It does an incredible job at planning and executing by orchestrating subagents with different roles.
This is also of interest!
Qwen3.6-27B-UD-Q4_K_XL can run at 45t/s with 131k q8 context on an RTX 4090.

That is pretty usable. You could get 65t/s or more with MTP, but only if you drop the context size, which I would advise against.

Results are better with 256k context and a larger quant, however, that's not going to fit on the 4090 you already had lying around for playing cyberpunk 2077.

The MoE models make me rather unhappy. Idk. They feel braindead to me, but YMMV.