| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by argonaut 3751 days ago

The difference is between training taking a week, and training taking 10 weeks.

It takes a week to train a standard AlexNet model on 1 GPU on ImageNet (and this is pretty far from state of the art).

It takes 4 GPUs 2 weeks to train a marginally-below state of the art image classifier on ImageNet (http://torch.ch/blog/2016/02/04/resnets.html) - the 101 layer deep residual network. This would be 20 weeks on an ensemble of CPUs. (State of the art is 152 layers; I don't have the numbers but I'd guess-timate 3-4 weeks to train on 4 GPUs).