|
|
|
|
|
by survirtual
78 days ago
|
|
A AMD Ryzen AI Max Pro 396 will get 50t/s with Qwen3.5 35B. In addition, the these local models are very, very, very sensitive to the template used. Make sure it is correct. I was using the wrong template and it would answer but felt like it had a brain worm. The parameters must also be what is recommended, otherwise they go off the rails. I get great results now after messing with it for a while. I prefer the 35B model because I enjoy how fast tokens appear at 50t/s, but at around 20-25t/s with the 122B model, it is also completely usable. And that one is very smart. |
|