|
|
|
|
|
by calaphos
198 days ago
|
|
That is comparing an all to all switched Nvlink fabric to a 3D torus for TPUs. Those are completely different network topologies with different tradeoffs. For example the currently very popular Mixture of Experts architectures require a lot of all to all traffic (for expert parallelism) which works a lot better on the switched NVlink fabric as opposed where it doesn't need to traverse multiple links in the torus. |
|