| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by frogblast 1887 days ago
	Are there any good resources out there describing in practice how existing training workloads are distributed among GPUs? (using tensorflow, pytorch, or whatever else?). I'm curious how the problem effectively gets sliced.

1 comments

SOTA on the biggest language models (which is where effectively the largest models are) is here: https://www.microsoft.com/en-us/research/blog/zero-infinity-...