Hacker News new | ask | show | jobs
by zhisbug 1098 days ago
yep, it is agonistic to 4-bit. You can deploy a 4-bit model and still use vllm + pagedattention to double or even triple your serving throughput.
3 comments

If this were submitted as a new comment it would be at the top of the page.
You mean like, theoretically, in the future? Or you mean today?
probably mean agnostic, agonistic implies the opposite.
oops typo