Y
Hacker News
new
|
ask
|
show
|
jobs
by
OneDeuxTriSeiGo
39 days ago
As far as I can tell MTP is unique from regular speculative decode because the small model is trained to consume and operate on the big model's hidden state for prediction.