| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zhisbug 1099 days ago
	This really depends on what GPUs you use. If you GPUs has very small amount of memory, vLLM will help more. vLLM addresses the memory bottleneck for saving KV caches and hence increases the throughput.