Apple Silicon before the M4 does not have matmul instructions which causes the prompt processing to be very slow. It's quite different on the M5, much like using a nvidia GPU
This is qwen3.6:27b-coding-nvfp4. It's only an M1. If they ever ship an M5 studio with 96GB of ram, that's my next upgrade path for the local llm experiments.
You can get work done with them if you have a harness that can drive outcomes without needing feedback (I've been building a tdd red to green agent harness lately that is very effective if given a good plan upfront). So if you can stand waiting a few days to see results that would only take hours with a model deployed to frontier nvidia hardware, you can get results this way.