| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Zacharias030 275 days ago
	There is no reason that it couldn’t be beneficial for training though.

1 comments

cubefox 274 days ago

Except that speculative decoding is de facto only an inference time optimization. But the H-Net architecture from the previous reference, which doesn't require tokens or speculative decoding, does something similar both for inference and training.

link

Zacharias030 274 days ago

Yes, but the discussion is about Multi-Token Prediction (Gloeckle et al. 2024) which is only incidentally useful for speculative decoding.

link