|
|
|
|
|
by kamikaz1k
656 days ago
|
|
Not really familiar with this space but I think the entire Dojo/DIY strategy was kicked off because Elon wanted to not get cornered on supply or cost by nvidia. And infiniband is an nvidia technology, so they wouldn’t use that simply from strategic POV. Are there other technologies they could have used? Also, the 80us is supposed to be the worst case, where typical is supposed to be <10us. Again not knowing anything about infiniband, what’s the typical perf? I tried to google but the people who are talking about it are in the know in ways I’m not. Thanks! |
|
It is definitely possible to go much lower than 80usec on Ethernet. But obviously it depends on the scale, utilisation etc.
At the sizes of GPU clusters we're talking about these days - 32K and up - things get tricky.
The main alternative to Infiniband used in the industry is RoCE - Meta has written a lot about it [0].
There's several reasons to avoid Infiniband, such as cost, availability, vendor lock in, lack of experience etc.
Those are some of the reasons why many players are trying hard to make Ethernet work, see Ultra Ethernet [1].
[0] https://engineering.fb.com/2024/08/05/data-center-engineerin...
[1] https://ultraethernet.org/