Hacker News new | ask | show | jobs
by woadwarrior01 421 days ago
That should not be the case. Speculative decoding is trading off compute for memory bandwidth. The model's output is guaranteed to be the same, with or without it. Perhaps there's a bug in the implementation that you're using.