| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by efficax 38 days ago
	qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram.

2 comments

smcleod 38 days ago

Apple Silicon before the M4 does not have matmul instructions which causes the prompt processing to be very slow. It's quite different on the M5, much like using a nvidia GPU

link

2ndorderthought 38 days ago

Yea you probably do want to use a GPU for models of that size.

I also wonder what quantization you are using? If you haven't tried other quants I really would

link

efficax 38 days ago

This is qwen3.6:27b-coding-nvfp4. It's only an M1. If they ever ship an M5 studio with 96GB of ram, that's my next upgrade path for the local llm experiments.

You can get work done with them if you have a harness that can drive outcomes without needing feedback (I've been building a tdd red to green agent harness lately that is very effective if given a good plan upfront). So if you can stand waiting a few days to see results that would only take hours with a model deployed to frontier nvidia hardware, you can get results this way.

link

datadrivenangel 38 days ago

The time delay is the real issue. Much much slower wall clock time.

link