Hacker News new | ask | show | jobs
by toyg 9 days ago
I've been playing with local models for some time, and I've been pleasantly surprised of late. A meager rtx 5080 with 16gb can give pretty good results now. The ecosystem is also improving pretty quickly.

I have a feeling at some point we will have a "Windows 95" moment (when computing really became personal for the masses) in AI, and things will significantly change shape again.

1 comments

What local model do you recommend these days? I’ve got a 4090, mostly sitting idle.
The answer to which ai model, in mid 2026, is always qwen. Depending on your ram, it’s qwen3.5-9b, qwen3.6-35b-a3 in a 3 or 4 bit quant, or qwen3.6-27b. I’m told a bigger model quantized is better than a smaller model unquantized. In 16Gb vram on 10 year old hardware i can run a 3bit quant of qwen3.6-35b-a3 at ~30tokens/sec, and it can do a lot.
qwen 3.5 with 9b is being a pretty decent workhorse for me, even with context around 4k.