| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rcxdude 809 days ago
	It's not a task which benefits much from dividing it into lots of small work units that all get processed in parallel without much communication between the nodes. It's naturally almost the complete opposite: it wants very high bandwidth between all the compute units, because each iteration of the training is calculating the derivative of and then updating all the weights of the network. Splitting it up only slows it down: even if you were to distribute training amongst 10x the compute nodes each of which was 10x faster, if your bandwidth drops to even 1/2 you're gonna lose out. This is why all the really big models need a lot of very tightly integrated hardware.

1 comments

xyproto 809 days ago

Just like brains.

beepbooptheory 809 days ago

It seems to me like we get a lot of stuff done splitting up the brain work.