|
|
|
|
|
by yorwba
232 days ago
|
|
In particular, part of the paper is about dynamically adjusting the number of tokens generated in parallel while maintaining roughly the same output quality as one-token-at-a-time decoding. The other part is about the KV caching strategy they use to speed up parallel decoding further. |
|