|
|
|
|
|
by zozbot234
40 days ago
|
|
Speculative decoding is not that useful at scale, it's mostly about making local single-user inference faster. When you're batching multiple inferences together, that's already as fast as the verification you have to perform w/ speculative decoding. |
|