|
|
|
|
|
by iugtmkbdfil834
9 days ago
|
|
<< You will always have horrifically slow latency compared to if you pack the servers together in the same place with specialized networking. Agree about the physics; disagree about the larger point. I am not questioning that servers packed together may achieve an optimal result in how we are currently doing things, but, and this is my point, what if we didn't. << you cannot get that with distributed training This is entirely the wrong question to ask. The question to ask is: how it could be adapted to distributed training. |
|
I think the most important problem is that you have to marshall enough compute to be meaningful, and that is going to be more and more difficult as frontier compute requirements grow.