| HN Mirror

You’re correct, I was going to add on to my answer that this is combined with DDP for a “3D” style parallelism that could specifically benefit from the TPU’s network topology, but by the time I got done fixing all the typos (and still missing a few) from writing it on my phone I completely forgot :)