|
|
|
|
|
by jononor
82 days ago
|
|
Have been playing with Qwen3.5 35B. Runs OK nicely on a RTX5060Ti, though I would have liked to have a bit higher thoughput (a 5080/5090 would do). It is seemingly close-but-not-quite-there for code generation / agentic coding. So I am actually quite hopeful that in a few years time, using local LLM models will be quite feasible. |
|
In addition, the these local models are very, very, very sensitive to the template used. Make sure it is correct. I was using the wrong template and it would answer but felt like it had a brain worm.
The parameters must also be what is recommended, otherwise they go off the rails.
I get great results now after messing with it for a while. I prefer the 35B model because I enjoy how fast tokens appear at 50t/s, but at around 20-25t/s with the 122B model, it is also completely usable. And that one is very smart.