Hacker News new | ask | show | jobs
by dumitrue 4841 days ago
So actually if you read Geoff Hinton's paper on how they won the ImageNet challenge, they use 2 GPUs. Whether this qualifies as "custom hardware" or not, it's debatable, but it's certainly striking that you can obtain performance that is about as good as the 16k cores result. The results and the models are not directly comparable, but still.

One limitation of using GPUs is the fact that they have comparatively little on-board memory (max 6GB for now) and shuffling data in/out of it is expensive (because of communication costs).

Now there's also the idea that you can maybe obtain something interesting from a learning perspective when training on 16k cores. Because you have to essentially train a bunch of different models, that have different parameters and see potentially different subsets of your training set, ultimately you perform some sort of model averaging (since training is doing it for you). In essence, by doing distributed gradient descent you are potentially training a clever ensemble of networks, which arguably could give better generalization.

Can you have an ensemble of GPUs? Now that's an interesting question too :)

1 comments

I was thinking about ASIC? Here is a bitcoin mining solution: http://www.butterflylabs.com/
Check this out, if you're curious about this line of thought: http://yann.lecun.com/exdb/publis/pdf/farabet-fpl-09.pdf

The ImageNet winning model is a essentially glorified ConvNet :)