Hacker News new | ask | show | jobs
by gnabgib 523 days ago
Related:

New Training Technique for Highly Efficient AI Methods (2 points, 5 hours ago) https://news.ycombinator.com/item?id=42690664

DiLoCo: Distributed Low-Communication Training of Language Models (46 points, 1 year ago, 14 comments) https://news.ycombinator.com/item?id=38549337

1 comments

Thanks for the links, some interesting discussion there.

The second article you linked indicates there will still be intense bandwidth requirements during training, shipping around gradient differentials.

What has changed in the past year? Is this technique looking better, worse, or the same?

Yeah, high bandwidth requirements still remaining. Over the past year, more research has looked from fully async to restrained cases that allow for geographically distributed compute. Async Local-SGD goes for a more standard training objective comparable with a lockstep training, https://arxiv.org/abs/2401.09135. imo technique is looking better.