Hacker News new | ask | show | jobs
by metadat 521 days ago
Thanks for the links, some interesting discussion there.

The second article you linked indicates there will still be intense bandwidth requirements during training, shipping around gradient differentials.

What has changed in the past year? Is this technique looking better, worse, or the same?

1 comments

Yeah, high bandwidth requirements still remaining. Over the past year, more research has looked from fully async to restrained cases that allow for geographically distributed compute. Async Local-SGD goes for a more standard training objective comparable with a lockstep training, https://arxiv.org/abs/2401.09135. imo technique is looking better.