Y
Hacker News
new
|
ask
|
show
|
jobs
by
minimaxir
541 days ago
It depends on how the parallelism is implemented, e.g. distributed data parallel (DDP) to synchronize gradients:
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html
It's a rabbit hole I stay away from for pragmatic reasons.