Hacker News new | ask | show | jobs
by CamperBob2 85 days ago
I don't think you can make that case for 35b and up, including the 27B dense model. A hypothetical Mac Studio with 512 GB and an M5 Ultra would be able to run the full Qwen 3.5 397B model at a decent speed, which is more like 12 months behind the current SoTA.

A lot of people got a bad first impression about the 3.5 models for a few different reasons. Llama.cpp wasn't able to run them optimally, tool calling was broken, the sampling parameters weren't documented completely, and some poor-quality quants got released. Now that these have all been addressed, they are serious models capable of doing serious business on reasonably-accessible hardware.