| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gnabgib 523 days ago

New Training Technique for Highly Efficient AI Methods (2 points, 5 hours ago) https://news.ycombinator.com/item?id=42690664

DiLoCo: Distributed Low-Communication Training of Language Models (46 points, 1 year ago, 14 comments) https://news.ycombinator.com/item?id=38549337

1 comments

metadat 523 days ago

Thanks for the links, some interesting discussion there.

The second article you linked indicates there will still be intense bandwidth requirements during training, shipping around gradient differentials.

What has changed in the past year? Is this technique looking better, worse, or the same?

link

gaogao 522 days ago

Yeah, high bandwidth requirements still remaining. Over the past year, more research has looked from fully async to restrained cases that allow for geographically distributed compute. Async Local-SGD goes for a more standard training objective comparable with a lockstep training, https://arxiv.org/abs/2401.09135. imo technique is looking better.

link