Y
Hacker News
new
|
ask
|
show
|
jobs
by
ben_s
177 days ago
Once you oversubscribe GPU memory, performance usually collapses. Frameworks like vLLM can explicitly offload things like the KV cache to CPU memory, but that's an application-level tradeoff, not transparent GPU virtual memory.