| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ben_s 226 days ago
	Once you oversubscribe GPU memory, performance usually collapses. Frameworks like vLLM can explicitly offload things like the KV cache to CPU memory, but that's an application-level tradeoff, not transparent GPU virtual memory.