Hacker News new | ask | show | jobs
by salawat 103 days ago
Yes, because speculation has NEVER bitten us in the ass before, right? Coughs in Spectre

Speculative decoding is just running more hardware to get a faster prediction. Essentially, setting more money on fire if you're being billed per token.