|
|
|
|
|
by p1esk
649 days ago
|
|
LambdaLabs GPU cluster provides internode bandwidth of 3.2Tbps: I personally verified it in a cluster of 64 nodes (8xH100 servers) and they claim it holds for up to 5k GPU cluster. What is the internode bandwidth of Frontier? Someone claimed it's 200Gbps, which, if true, would be a huge bottleneck for some ML models. |
|
Bisection bandwidth is the metric these systems will cite, and impacts how the largest simulations will behave. Inter-node bandwidth isn't a direct comparison, and can be higher at modest node counts as long as you're within a single switch. I haven't seen a network diagram for LambdaLabs, but it looks like they're building off 200Gbps Infiniband once you get outside of NVLink. So they'll have higher bandwidth within each NVLink island, but the performance will drop once you need to cross islands.