Hacker News new | ask | show | jobs
by skeledrew 54 days ago
It'll be a while yet before open models that're good enough will be viable for local use. Heck I've been trying to use the Qwen 3.5 39B A3B on my system, which is modest but no slouch, and have only been able to get ~4.5 tok/s after optimization, and it really runs my system red (fans instantly go crazy). It's just not practical for serious work.
1 comments

I've been using Qwen 3.5 and then 3.6 27b Q4 on Ollama with a single 7900 XTX with the codex cli, and I have been blown away by how genuinely useful it is. I've been able to ask it to do long, multi step problems, and it's able to do things that would have likely taken me days to iron out in a matter of hours, or even minutes sometimes.

I get about 30 tok/s, which is far from blazing, but given the capability it has it is absolutely viable for accelerating my work.