Hacker News new | ask | show | jobs
by atty 1165 days ago
You’re correct, I was going to add on to my answer that this is combined with DDP for a “3D” style parallelism that could specifically benefit from the TPU’s network topology, but by the time I got done fixing all the typos (and still missing a few) from writing it on my phone I completely forgot :)