| HN Mirror

Edit: I’m reading this to try and get some sense of the issues - https://www.amazon.science/blog/near-linear-scaling-of-gigan...

What about with some fairly frequent and periodic synchronization?

Is there potentially some balance where small enough subsets can be chosen and disparate workers broadcast the small changes at small enough intervals that the net gain in learnings is still larger than the loss in fit due to de-cohesion. I was thinking maybe this algorithm would be 10x less energy efficient but have the benefit of decentralization. Something along those lines.

I’m guessing the current training algorithms do something like this but since rapid synchronization always makes the efficiency increase (in the extreme that giant single wafer cpu) then openAI and others use systems with high interconnect bandwidth.