|
|
|
|
|
by rcxdude
647 days ago
|
|
There is a difference between a supercomputer and just a large cluster of compute nodes: mainly this is in the bandwidth between the nodes. I suspect industry uses a larger number of smaller groups of highly-connected GPUs for AI work. |
|
I think the main difference is in the target numerical precision: supercomputers such as this one focus on maximizing FP64 throughput, while GPU clusters used by OpenAI or xAI want to compute in 16 or even 8 bit precision (BF16 or FP8).