|
|
|
|
|
by romerocesar
3695 days ago
|
|
One important difference is model-parallel training. From the FAQ: DSSTNE instead uses “model-parallel training”, where each layer of the network is split across the available GPUs so each operation just runs faster. Model-parallel training is harder to implement, but it doesn’t come with the same speed/accuracy trade-offs of data-parallel training. https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md |
|