| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bobim 65 days ago
	Could you share some of your hardware details for Qwen 3.6? And are you using the dense or MoE variant?

2 comments

regexorcist 65 days ago

Sure, I have a 64G MBP with an M1 Ultra. The best model for me by far has been the 35B A3B, in particular the 8Q_KL unslouth variant. The dense model works but it's much slower, and I don't really see a difference in quality with a good harness.

link

koyote 65 days ago

What do you use as a harness?

link

regexorcist 64 days ago

I use oh-my-openagent. It does an incredible job at planning and executing by orchestrating subagents with different roles.

link

bobim 65 days ago

This is also of interest!

link

hypfer 65 days ago

Qwen3.6-27B-UD-Q4_K_XL can run at 45t/s with 131k q8 context on an RTX 4090.

That is pretty usable. You could get 65t/s or more with MTP, but only if you drop the context size, which I would advise against.

Results are better with 256k context and a larger quant, however, that's not going to fit on the 4090 you already had lying around for playing cyberpunk 2077.

The MoE models make me rather unhappy. Idk. They feel braindead to me, but YMMV.

link