|
|
|
|
|
by uptownfunk
1159 days ago
|
|
Why does AWS seem to suggest otherwise? >> With only a few lines of additional code, you can add either data parallelism or model parallelism to your PyTorch and TensorFlow training scripts and Amazon SageMaker will apply your selected method for you. SageMaker will determine the best approach to split your model by using graph partitioning algorithms to balance the computation of each GPU while minimizing the communication between GPU instances. SageMaker also optimizes your distributed training jobs through algorithms that are designed to fully utilize AWS compute and network infrastructure in order to achieve near-linear scaling efficiency, which allows you to complete training faster than manual implementations. https://aws.amazon.com/sagemaker/distributed-training/ |
|