There are no practically useful small models, including Qwen 3.5. Yes, the small models of today are a lot more interesting than the small models of 2 years ago, but they remain broadly incoherent beyond demos and tinkering.
I don't think you can make that case for 35b and up, including the 27B dense model. A hypothetical Mac Studio with 512 GB and an M5 Ultra would be able to run the full Qwen 3.5 397B model at a decent speed, which is more like 12 months behind the current SoTA.
A lot of people got a bad first impression about the 3.5 models for a few different reasons. Llama.cpp wasn't able to run them optimally, tool calling was broken, the sampling parameters weren't documented completely, and some poor-quality quants got released. Now that these have all been addressed, they are serious models capable of doing serious business on reasonably-accessible hardware.
A lot of people got a bad first impression about the 3.5 models for a few different reasons. Llama.cpp wasn't able to run them optimally, tool calling was broken, the sampling parameters weren't documented completely, and some poor-quality quants got released. Now that these have all been addressed, they are serious models capable of doing serious business on reasonably-accessible hardware.