|
|
|
|
|
by sailingparrot
1609 days ago
|
|
At 16k it will definitely be the biggest. As for today, Nvidia has this a very slightly smaller cluster that you outlined at ~5k, Microsoft as a few of them roughly of that size, and Microsoft also built a 10k GPU cluster for OpenAI 2 years ago, but those are V100 GPUs. So, is 6k A100 "bigger" than 10k V100? Depends exactly how you use them, in a perfect usage scenario yes, slightly. In real life maybe not. |
|
The point of making this machine is to have a lot of A100s going at the same time, and that will unblock some small set of researchers who are working on time-sensitive competitive research projects by giving them a slightly throughput and latency advantage on the largest problems. The vast majority of users would be better served by a small number of cheaper, slower GPUs that they had exclusive access to for the longest time period they could afford to wait.