|
|
|
|
|
by woadwarrior01
421 days ago
|
|
That should not be the case. Speculative decoding is trading off compute for memory bandwidth. The model's output is guaranteed to be the same, with or without it. Perhaps there's a bug in the implementation that you're using. |
|