|
|
|
|
|
by a_e_k
45 days ago
|
|
From the linked post, it didn't read like a separate KV cache was needed: > The draft models seamlessly utilize the target model's activations and share its KV cache, meaning they don't have to waste time recalculating context the larger model has already figured out. |
|