Y
Hacker News
new
|
ask
|
show
|
jobs
by
PeterisP
1183 days ago
Distributed learning sucks for this type of models, averaging the results helps if you can do that
often
which requires very high bandwidth - i.e. the Infiniband interconnects between Nvidia pods which go up to 200 Gbps.