Hacker News new | ask | show | jobs
by proverbialbunny 2170 days ago
In order the bottleneck is: gpu ram, cpu ram, then pci-e lanes.

There is a big delay moving memory from ram to vram to run a task on the gpu, so much so that you'd be better off running the task on the cpu if you can't fit it all in the gpu, or are very clever in how data is buffered, which isn't an option for neural networks. Because of this, the pci-e lane is not saturated except when first sending the data to vram. PCI-E 3.0 x8 runs at 7880MB/s, so if your gpu has 16gb of vram, the difference between x8 and x16 is 1 second, when a task can typically take 8+ hours to complete.