Hacker News new | ask | show | jobs
by wickberg 644 days ago
Frontier is 4x 200Gbps links per node into the interconnect. The interconnect is designed for 540TB/s of bisection bandwidth. <https://icl.utk.edu/files/publications/2022/icl-utk-1570-202...>

Bisection bandwidth is the metric these systems will cite, and impacts how the largest simulations will behave. Inter-node bandwidth isn't a direct comparison, and can be higher at modest node counts as long as you're within a single switch. I haven't seen a network diagram for LambdaLabs, but it looks like they're building off 200Gbps Infiniband once you get outside of NVLink. So they'll have higher bandwidth within each NVLink island, but the performance will drop once you need to cross islands.

1 comments

I thought NVLink is only for communication between GPUs within a single node, no? I don't know what the size of their switches are, but I verified that within a 64 node cluster I got the full advertised 3.2Tbps bandwidth. So that's 4x as fast as 4x200Gbps, but 800Gbps is probably not a bottleneck for any real world workload.