Hacker News new | ask | show | jobs
by treprinum 567 days ago
K80 used to be two glued K40 but their interconnect was barely faster than PCIe so it didn't have much benefit as one had to move stuff between two internal GPUs anyway.
1 comments

Inference workloads likely won’t care very much. For llama 3.1 405B with bf16 when you split the workload across GPUs by layer, you need to do a 32KB memory copy before the next GPU can begin processing. That can be done incredibly quickly over PCI-E.