Y
Hacker News
new
|
ask
|
show
|
jobs
by
george_123
1098 days ago
this approach to managing KV cache can work with 4bit. imagine the speedup of pagedattention with quantization..
1 comments
zhisbug
1098 days ago
yep, it is agonistic to 4-bit. You can deploy a 4-bit model and still use vllm + pagedattention to double or even triple your serving throughput.
link
ynniv
1098 days ago
If this were submitted as a new comment it would be at the top of the page.
link
baobabKoodaa
1098 days ago
You mean like, theoretically, in the future? Or you mean today?
link
ipsum2
1098 days ago
probably mean agnostic, agonistic implies the opposite.
link
zhisbug
1098 days ago
oops typo
link