| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by piiswrong 3501 days ago
	Amazon probably used P2 because they want to advertise it. We can get almost linear speedup on 10 8xM40 machines using MXNet. Batch size is linearly increased with # of machines but empirically it doesn't hurt convergence, at least on imagenet. I mean who cares about AlexNet any more? It's 2016 already. It trains in under 2h on a single machine. Distributing it doesn't make much sense

2 comments

oneshot908 3500 days ago

Publish those numbers with the sample code to reproduce them. Your first paragraph is enough for an awesome white paper/use case to drive adoption. Don't let silly AWS internal politics get in the way if you work there. Find a workaround.

Amazon is at its best when it's customer obsessed and at its worst when it puts politics first.

All IMO of course.

link

p1esk 3501 days ago

2 hours to train Alexnet on a single machine? Link please.

link

piiswrong 3501 days ago

https://developer.nvidia.com/cudnn Alex did it on 2x580 in 2012. Took him 1 week. It's 60x faster now even compared to K40

link