|
|
|
|
|
by avital
1173 days ago
|
|
This isn't accurate. The bottleneck in very-large-scale-training BY FAR is communication between devices. If you have a million CPUs, the communication cost will be significantly higher than a thousand A100s (perhaps in the order of 100x or even more). So this is only possible to replicate with very dense and high compute chips with extremely fast interconnect. |
|