|
|
|
|
|
by tomekowal
6 days ago
|
|
With qwen3.6-35b-a3b-mtp using lm-studio on RTX 3090, I was getting 120tokens/s. The mtp (multi token prediction) is the key. I tired coding with Pi and it was much faster than Claude, but for any not-straightforward tasks, it did so so. Either looping itself or not realising easy to spot constraints. But for exploring codebases and asking questions about big stuff I find it better due to sheer speed. |
|