Hacker News new | ask | show | jobs
by survirtual 78 days ago
A AMD Ryzen AI Max Pro 396 will get 50t/s with Qwen3.5 35B.

In addition, the these local models are very, very, very sensitive to the template used. Make sure it is correct. I was using the wrong template and it would answer but felt like it had a brain worm.

The parameters must also be what is recommended, otherwise they go off the rails.

I get great results now after messing with it for a while. I prefer the 35B model because I enjoy how fast tokens appear at 50t/s, but at around 20-25t/s with the 122B model, it is also completely usable. And that one is very smart.