Hacker News new | ask | show | jobs
by cwyers 321 days ago
So, the way speculative decoding works, the model begins predicting at the first wrong token, so you still get 'is' for free.