The network bandwidth between nodes is a bigger limitation than compute. The newest Nvidia cards come with 400gbit busses now to communicate between them, even on a single motherboard.
Compared to SETI or Folding @Home, this would work glacially slow for AI models.
No, the problem is that with training, you do care about latency, and you need a crap-ton of bandwidth too! Think of the all_gather; think of the gradients! Inference is actually easier to distribute.
Yeah, but if you can do topologies based on latencies you may get some decent tradeoffs. For example with N=1M nodes each doing batch updates in a tree manner, i.e the all reduce is actually layered by latency between nodes.
https://www.distributed.net/RC5
https://en.wikipedia.org/wiki/RSA_Secret-Key_Challenge
I wonder what kind of performance would I get on a M1 computer today... haha
EDIT: people are still participating in rc5-72...?? https://stats.distributed.net/projects.php?project_id=8