| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by calaphos 245 days ago
	That is comparing an all to all switched Nvlink fabric to a 3D torus for TPUs. Those are completely different network topologies with different tradeoffs. For example the currently very popular Mixture of Experts architectures require a lot of all to all traffic (for expert parallelism) which works a lot better on the switched NVlink fabric as opposed where it doesn't need to traverse multiple links in the torus.

2 comments

zamadatix 245 days ago

This is an underrated point. Comparing just the peak bandwidth is like saying Bulldozer was the far superior CPU of the era because it had a really high frequency ceiling.

link

markhahn 245 days ago

Really? Fully-connected hardware is in buildable (at scale) which we already know from the HPC world. Fat trees and dragonfly networks are pretty scalable, but a 3d torus is a very good tradeofff, and respects the dimensionality of reality.

Bisection bandwidth is a useful metric, but is hop count? Per-hop cost tends to be pretty small.

link

zamadatix 245 days ago

Latency (of different types), jitter, and guaranteed bandwidth are the real underlying metrics. Hop count is just one potential driver of those, but different approaches may or may not tackle each of these parts differently.

link