Hacker News new | ask | show | jobs
by romerocesar 3695 days ago
One important difference is model-parallel training. From the FAQ:

DSSTNE instead uses “model-parallel training”, where each layer of the network is split across the available GPUs so each operation just runs faster. Model-parallel training is harder to implement, but it doesn’t come with the same speed/accuracy trade-offs of data-parallel training.

https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md