|
|
|
|
|
by mli
3500 days ago
|
|
The results are reported using synchronized SGD with each GPU using batch-size 32. More details such as scripts to reproduce the results, scalability results on various networks (including Alexnet) and various batch sizes will be available soon. I'll put more technical details such as implementation details and performance analysis in my phd thesis. |
|
I could believe you if tell you me that the validation loss and test accuracy of the large distributed model remains as good as the sequential, single GPU model after the same total number of epochs but this is not a given and if it's not the case I would find those benchmarks deceptive.