Hacker News new | ask | show | jobs
by PeterisP 1183 days ago
Distributed learning sucks for this type of models, averaging the results helps if you can do that often which requires very high bandwidth - i.e. the Infiniband interconnects between Nvidia pods which go up to 200 Gbps.