| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by seydor 263 days ago
	How come we don't have AI@Home

1 comments

throwup238 263 days ago

The network bandwidth between nodes is a bigger limitation than compute. The newest Nvidia cards come with 400gbit busses now to communicate between them, even on a single motherboard.

Compared to SETI or Folding @Home, this would work glacially slow for AI models.

link

fourthark 263 days ago

Seems like training would be a better match, where you need tons of compute but don’t care about latency.

link

ronsor 263 days ago

No, the problem is that with training, you do care about latency, and you need a crap-ton of bandwidth too! Think of the all_gather; think of the gradients! Inference is actually easier to distribute.

link

meehai 263 days ago

Yeah, but if you can do topologies based on latencies you may get some decent tradeoffs. For example with N=1M nodes each doing batch updates in a tree manner, i.e the all reduce is actually layered by latency between nodes.

link

shaklee3 263 days ago

800Gbps

link