Hacker News new | ask | show | jobs
by dave168 3590 days ago
CNTK is great at scaling out beyond a simple machine. The paper didn't benchmark that but only tested one single box performance.
1 comments

Realistically, most people barely get to multiple GPUs, let alone multiple machines. You're more likely to do hyperparameter tuning across machines before you do distributed training.