Hacker News new | ask | show | jobs
by gpugreg 18 days ago
Those were amazing times. You could vibe code an entire prototype in seconds (200 tps). With Qwen3.6-35B-A3B and MTP, you can program at that speed on a single GPU at home now, but Kimi K2 is of course much smarter at almost 30 times the size.

I'm also looking forward for the Cerebras Kimi K2.6 release, which should be even better at 1000 tps. It is hard to overstate how important speed is for programming. Instead of having to wait for a few minutes until a task is done, it is just done instantly, and you don't have to context switch from whatever else you were working on while waiting.

I hope they will make it available to regular customers.

1 comments

But too much of a speed doesn’t allow you to build up the context as the llm is working, it’s a two-edged sword.