Hacker News new | ask | show | jobs
by prajit 3881 days ago
Yes, there are two types of parallelism: model and data. Data parallelism is simply training the the model on multiple computers with different minibatches, and aggregating the gradients. Model parallelism is hosting different parts of the model on different computers. Of course, these two parallelisms can be combined. A good explanation is the One Weird Trick paper: http://arxiv.org/abs/1404.5997