Hacker News new | ask | show | jobs
by 5kg 4 days ago
Host to device bandwidth (ram to vram) is 128Gb/s for PCIe Gen 6. VRAM to GPU bandwidth is 1.8Tb/s for GDDR 7 (5090), and 8Tb/s for HBM3e (B200). So it can be faster to recompute than offload kv cache.