Hacker News new | ask | show | jobs
by maherbeg 6 days ago
I wonder if new models will be trained with speculative decoding as a core feature allowing fewer experts to be needed for a pass.
1 comments

The way I read OP is that it's ultimately highlighting the expense of verifying a possibly-wrong speculated token (in fact, the first wrong token invalidates all subsequent tokens too) which also applies to things like MTP that are a core feature of the model. You can decide you just don't care about matching the accuracy of the original model and skip the verification part altogether, but then you're moving closer to something like a text-diffusion model, with very different tradeoffs involved.