Hacker News new | ask | show | jobs
by andy_ppp 1025 days ago
Are there big reasons the training can’t be done SETI at home style - you could even pay people for use of their graphics cards and do the training multiple times on different machines to make sure results weren’t being gamed.
4 comments

There is that, I think it's https://vast.ai/ and pretty sure there is also a "community" one I've seen for gen AI but I can't remember the name.
AI Horde
Yes that's what I was thinking of, thanks! https://aihorde.net
Is this inference not training?
Yes it's inference only (and usually pretty slow at that).
Training still relies on very low latency connection between all the devices. When distributing training across multiple machines most people use machines in close vicinity connected via infiniband to have the lowest possible latency.

Going from that to the dozens to hundreds of milliseconds of latency on the internet, or the hours if you do classical SETI@Home, is a big step. There are people working on it though.

GPU memory bandwidth is a limiting factor for how fast training can happen, so it’s much more efficient to train models on locally connected high memory GPUs.

Also gradient updates from all nodes would need to get combined at least every few training steps, and it would take a while to sync all gradient updates across the network.

There's Petals[0], but the problem seems to be that the entire training data needs to be loaded into VRAM and can't be split up across devices.

[0] https://github.com/bigscience-workshop/petals