| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dackdel 2 days ago
	what kind of hardware do you need in order to run qwen3.6-27b

4 comments

giancarlostoro 2 days ago

Depends on which variant you pull down, but a single 5090 GPU (I know these are insanely expensive, but for context) could run either the Q8 or Q4_K_M version. It will not fit the 52GB version (BF16) on the other hand. So any modern Mac with a Pro or better processor and more than 52GB of RAM (don't forget VRAM for context window also matters!) would suffice, as someone else noted, probably a 128GB model would do the trick, and give you enough wiggle room to max out the context window.

My Mac only has 16GB of VRAM (20GB total - 8 is reserved for the OS) so I have to leave room for VRAM, I usually find a model that fits in 5 to 7 GB of VRAM and then max the context window as much as I can.

link

daemonologist 2 days ago

The benefit of running the full precision version is negligible (probably not even measurable above the benchmark noise floor). Most common for cost-conscious users is to run something around 4-6 bits per weight, which would fit on a 24 or 32 GB card (as you mentioned).

link

pixelesque 2 days ago

Note you can change the amount of shared (V)RAM reserved for the OS with:

sudo sysctl iogpu.wired_limit_mb=18800

will allow you to use more, but you do need to leave a bit for the OS obviously!

link

giancarlostoro 2 days ago

Oh man! I had no idea I could do this at all! What do you usually tweak it to? I feel like 8 GB is probably still a reasonable amount to give the rest of the OS.

link

pixelesque 2 days ago

I've got a 32 GB MBPro, and I set it to 27700, which I haven't seen a problem with so far.

link

giancarlostoro 1 day ago

Makes sense, in my case, I've got 24GBso I guess, cranking mine to roughly 20 might not hurt?

thanks

I recommend MacBook M5 Max with 128 GB of RAM to run it comfortably and fast. If you have something like a regular M4, go with qwen3.6-35b-a3d - the Mixture of Expert architecture makes it run 2-3x faster than the 27b version.

thanks

I could run it on 7900 XT with 64k context. You could run it more comfortably on a 24 gb vram.

link

dackdel 1 day ago

thanks

link

npodbielski 1 day ago

I bought r9700 for about 1700-1800$ and I have like 800t/s prompt and about 50t/s of inference on average? It hurt a bit when you change a prompt so llama.cpp have to discard entire cache and it have to think for 2-5min depending on the context, but otherwise it is faster than I can read.

link