Hacker News new | ask | show | jobs
by zhisbug 1099 days ago
This really depends on what GPUs you use. If you GPUs has very small amount of memory, vLLM will help more.

vLLM addresses the memory bottleneck for saving KV caches and hence increases the throughput.