|
|
|
|
|
by prajit
3881 days ago
|
|
Yes, there are two types of parallelism: model and data. Data parallelism is simply training the the model on multiple computers with different minibatches, and aggregating the gradients. Model parallelism is hosting different parts of the model on different computers. Of course, these two parallelisms can be combined. A good explanation is the One Weird Trick paper: http://arxiv.org/abs/1404.5997 |
|