Hacker News new | ask | show | jobs
by jeffreyames 930 days ago
10k H100 chips is considered a very large cluster. The third fastest supercomputer in the world is Microsoft’s eagle with 14k H100s https://www.top500.org/lists/top500/2023/11/
1 comments

Ah, gotcha, so the fact that its 10,000 chips for one dedicated cluster that makes it large, as opposed to Azure which has an order of magnitude more GPUS but rents many of those out.
High performance on a single task requires simultaneous computation and communication between nodes. If there's high latency between nodes, such as between nodes in different data centers, the communication costs can't be masked by computation.
I guess Azure's are spread out too. Latency higher to world wide datacentres.