Hacker News new | ask | show | jobs
by wfrew 1398 days ago
Adding network addressing to the GPU interconnect is kind of fascinating

Am I right in thinking the GPU-to-GPU communication is just shuttling chunks of data around for sharing inputs/outputs of computations? Or is there some other coordination going on between the GPUs directly with regards to the actual computations each is running? (Or is that still being managed wholly by the CPUs they're attached to?)

2 comments

Both - the first is for sharding tensors across GPUs, the second is to do an all reduce (e.g. for distributed data parallel to synchronize gradients)
Maybe it’s to speed up multi gpu matrix multiplies. They’re useful for serving/training gpt3 size models