| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 0-_-0 76 days ago
	3.6 already supports multi token generation AFAIK

1 comments

jbellis 76 days ago

Yes, but not diffusion based, it's still doing token-at-a-time speculation.

link

0-_-0 76 days ago

I thought it can do multiple tokens at a time

link

sleepyeldrazi 76 days ago

Think of this as another way of achieving that. This theoretically has a higher ceiling of how much it can predict at a time. And more importantly is a lot more memory efficient during actual inference.

link

regularfry 76 days ago

There was a chart from the Unsloth folks posted to Reddit in the last couple of days which showed that the draft sweet spot for MTP was 2-3 tokens ahead depending on the quant. Thats not much, and I think this might do a lot better. The whole "provably identical distribution" thing is doing a lot of work in my head, and I don't think that's true of the MTP model in qwen's architecture.

link