Y
Hacker News
new
|
ask
|
show
|
jobs
by
Zacharias030
275 days ago
There is no reason that it couldn’t be beneficial for training though.
1 comments
cubefox
274 days ago
Except that speculative decoding is de facto only an inference time optimization. But the H-Net architecture from the previous reference, which doesn't require tokens or speculative decoding, does something similar both for inference and training.
link
Zacharias030
274 days ago
Yes, but the discussion is about Multi-Token Prediction (Gloeckle et al. 2024) which is only incidentally useful for speculative decoding.
link