Hacker News new | ask | show | jobs
by piiswrong 3501 days ago
Amazon probably used P2 because they want to advertise it. We can get almost linear speedup on 10 8xM40 machines using MXNet. Batch size is linearly increased with # of machines but empirically it doesn't hurt convergence, at least on imagenet.

I mean who cares about AlexNet any more? It's 2016 already. It trains in under 2h on a single machine. Distributing it doesn't make much sense

2 comments

Publish those numbers with the sample code to reproduce them. Your first paragraph is enough for an awesome white paper/use case to drive adoption. Don't let silly AWS internal politics get in the way if you work there. Find a workaround.

Amazon is at its best when it's customer obsessed and at its worst when it puts politics first.

All IMO of course.

2 hours to train Alexnet on a single machine? Link please.
https://developer.nvidia.com/cudnn Alex did it on 2x580 in 2012. Took him 1 week. It's 60x faster now even compared to K40