Y
Hacker News
new
|
ask
|
show
|
jobs
by
zhisbug
1098 days ago
yep, it is agonistic to 4-bit. You can deploy a 4-bit model and still use vllm + pagedattention to double or even triple your serving throughput.
3 comments
ynniv
1098 days ago
If this were submitted as a new comment it would be at the top of the page.
link
baobabKoodaa
1098 days ago
You mean like, theoretically, in the future? Or you mean today?
link
ipsum2
1098 days ago
probably mean agnostic, agonistic implies the opposite.
link
zhisbug
1098 days ago
oops typo
link