|
|
|
|
|
by ramanvarma
231 days ago
|
|
skimmed the paper - how well does this plug into real serving stacks (paged-kv, vllm, speculative decoding, caching)? layer-wise top-k chunk voting sounds compatible, but does it fight with RoPE scaling or sliding-window kv eviction policies? |
|