Y
Hacker News
new
|
ask
|
show
|
jobs
by
zhisbug
1099 days ago
This really depends on what GPUs you use. If you GPUs has very small amount of memory, vLLM will help more.
vLLM addresses the memory bottleneck for saving KV caches and hence increases the throughput.