| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tomekowal 6 days ago

With qwen3.6-35b-a3b-mtp using lm-studio on RTX 3090, I was getting 120tokens/s. The mtp (multi token prediction) is the key.

I tired coding with Pi and it was much faster than Claude, but for any not-straightforward tasks, it did so so. Either looping itself or not realising easy to spot constraints.

But for exploring codebases and asking questions about big stuff I find it better due to sheer speed.