Hacker News new | ask | show | jobs
by grumpoholic 105 days ago
With speculative decoding you can use more models to speed up the generation however.
1 comments

Yes, because speculation has NEVER bitten us in the ass before, right? Coughs in Spectre

Speculative decoding is just running more hardware to get a faster prediction. Essentially, setting more money on fire if you're being billed per token.