Hacker News new | ask | show | jobs
by lovesdogsnsnow 1164 days ago
Aren't interconnect bandwidth and latency important for training workloads?
1 comments

Yes, but to move the needle you need to use another single vendor Mellanox (which nvidia owns).

So the (slight) differentiations are.

AWS - gpudirect and high bandwidth (but also high latency)

GCP - 16 gpus on one machine

Azure/OCI - mellanox