Hacker News new | ask | show | jobs
by jononor 82 days ago
Have been playing with Qwen3.5 35B. Runs OK nicely on a RTX5060Ti, though I would have liked to have a bit higher thoughput (a 5080/5090 would do). It is seemingly close-but-not-quite-there for code generation / agentic coding. So I am actually quite hopeful that in a few years time, using local LLM models will be quite feasible.
1 comments

A AMD Ryzen AI Max Pro 396 will get 50t/s with Qwen3.5 35B.

In addition, the these local models are very, very, very sensitive to the template used. Make sure it is correct. I was using the wrong template and it would answer but felt like it had a brain worm.

The parameters must also be what is recommended, otherwise they go off the rails.

I get great results now after messing with it for a while. I prefer the 35B model because I enjoy how fast tokens appear at 50t/s, but at around 20-25t/s with the 122B model, it is also completely usable. And that one is very smart.