"Here’s the funny bit. The UALink 1.0 specification will be done in the third quarter of this year, and that is also when the Ultra Accelerator Consortium will be incorporated to hold the intellectual property and drive the UALink standards. That UALink 1.0 specification will provide a means to connect up to 1,024 accelerators into a shared memory pod. In Q4 of this year, a UALink 1.1 update will come out that pushes up scale and performance even further. It is not clear what transports will be supported by the 1.0 and 1.1 UALink specs, or which ones will support PCI-Express or Ethernet transports.
NVSwitch 3 fabrics using NVLink 4 ports could in theory span up to 256 GPUs in a shared memory pod, but only eight GPUs were supported in commercial products from Nvidia. With NVSwitch 4 and NVLink 5 ports, Nvidia can in theory support a pod spanning up to 576 GPUs but in practice commercial support is only being offered on machines with up to 72 GPUs in the DGX B200 NVL72 system."
Great link. Next Platform in general has great coverage, imo.
I like how they do indeed quickly dive into: isn't it weird there's another fabric being made, when CXL is supposed to be arriving any day now? Ethernet (UALink) vs PCI (CXL) forever & ever, perhaps!
What's kind of weird/interesting is that it sounds like originally the idea to scale out Infinity Fabric was going to be on PCIe switch hardware, and this sounds like a bit of a pivot to using Layer 1 (which doesn't cover much!) Ethernet:
> The kernel of the Ultra Accelerator Link consortium was planted last December when CPU and GPU maker AMD and PCI-Express switch maker Broadcom said that the xGMI and Infinity Fabric protocols used to link its Instinct GPU memories to each other and also to the memories of CPU hosts using the load/store memory semantics of NUMA links for CPUs would be supported on future PCI-Express switches from Broadcom.
But now using Ethernet signaling. Cool future though:
> AMD is contributing the much broader Infinity Fabric shared memory protocol as well as the more limited and GPU-specific xGMI, to the UALink effort, and all of the other players are agreeing to use Infinity Fabric as the standard protocol for accelerator interconnects.
The notion that HyperTransport-next is back again as an industry wide accelerator is kind of sweet! (I'm assuming Infinity Fabric is largely an incremental update but for all we know it might be all new?)
Ultra Ethernet is competing with InfiniBand (the more flexible more scalable but slower mostly CPU-centric fabric technically developed by an industry consortium but in practice single sourced from NVIDIA) and, to a lesser extent, OmniPath (formerly Intel). Though many of the non-IB fabrics still rely on nVidia silicon at least for the HBA part.
"Here’s the funny bit. The UALink 1.0 specification will be done in the third quarter of this year, and that is also when the Ultra Accelerator Consortium will be incorporated to hold the intellectual property and drive the UALink standards. That UALink 1.0 specification will provide a means to connect up to 1,024 accelerators into a shared memory pod. In Q4 of this year, a UALink 1.1 update will come out that pushes up scale and performance even further. It is not clear what transports will be supported by the 1.0 and 1.1 UALink specs, or which ones will support PCI-Express or Ethernet transports.
NVSwitch 3 fabrics using NVLink 4 ports could in theory span up to 256 GPUs in a shared memory pod, but only eight GPUs were supported in commercial products from Nvidia. With NVSwitch 4 and NVLink 5 ports, Nvidia can in theory support a pod spanning up to 576 GPUs but in practice commercial support is only being offered on machines with up to 72 GPUs in the DGX B200 NVL72 system."