| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mmoskal 456 days ago
	Spec decoding only depends on the tokenizer used. It's transfering either the draft token sequence or at most draft logits to the main model.

2 comments

jasonjmcghee 456 days ago

Could be an lm studio thing, but the qwen3-0.6B model works as a draft model for the qwen3-32B and qwen3-30B-A3B but not the qwen3-235B-A22B model

link

jasonjmcghee 456 days ago

I suppose that makes sense, for some reason I was under the impression that the models need to be aligned / have the same tuning or they'd have different probability distributions and would reject the draft model really often.

link