Y
Hacker News
new
|
ask
|
show
|
jobs
by
5kg
4 days ago
Host to device bandwidth (ram to vram) is 128Gb/s for PCIe Gen 6. VRAM to GPU bandwidth is 1.8Tb/s for GDDR 7 (5090), and 8Tb/s for HBM3e (B200). So it can be faster to recompute than offload kv cache.