Hacker News new | ask | show | jobs
by josephg 5 days ago
What local model do you recommend these days? I’ve got a 4090, mostly sitting idle.
2 comments

The answer to which ai model, in mid 2026, is always qwen. Depending on your ram, it’s qwen3.5-9b, qwen3.6-35b-a3 in a 3 or 4 bit quant, or qwen3.6-27b. I’m told a bigger model quantized is better than a smaller model unquantized. In 16Gb vram on 10 year old hardware i can run a 3bit quant of qwen3.6-35b-a3 at ~30tokens/sec, and it can do a lot.
qwen 3.5 with 9b is being a pretty decent workhorse for me, even with context around 4k.