Hacker News new | ask | show | jobs
by HumanOstrich 48 days ago
That is.. inaccurate.
1 comments

How so? I’m not saying most of work doesn’t go into creating the drafting model or enabling a new head on the primary model, but the point is that however cool it is the result is, more weights. Speculative decoding requires code to be aware of how this works at the inference level.