Y
Hacker News
new
|
ask
|
show
|
jobs
by
frogblast
1887 days ago
Are there any good resources out there describing in practice how existing training workloads are distributed among GPUs? (using tensorflow, pytorch, or whatever else?).
I'm curious how the problem effectively gets sliced.
1 comments
singhrac
1887 days ago
SOTA on the biggest language models (which is where effectively the largest models are) is here:
https://www.microsoft.com/en-us/research/blog/zero-infinity-...
link