|
|
|
|
|
by d_burfoot
1056 days ago
|
|
> ... 2010. Before nobody believed that backprop can be GPU-accelerated. When I was doing my master's in 2004-06, I talked to a guy whose MSc thesis was about running NNs with GPUs. My thought was: you're going to spend a TON of time fiddling with hacky systems code like CUDA, to get basically a minor 2x or 4x improvement in training time, for a type of ML algorithm that wasn't even that useful: in that era the SVM was generally considered to be superior to NNs. So it wasn't that people thought it couldn't be done, it's that nobody saw why this would be worthwhile. Nobody was going around saying, "IF ONLY we could spend 20x more compute training our NNs, then they would be amazingly powerful". |
|
It's easy to see in retrospect, but hard in prospect: the original paper [1] on GPU acceleration of NNs reports a measly 20x speedup. Assuming a bit of cherry-picking on the author's side to make get the paper published, the 'real-world speedup' will have been assumed by the readership to be less. But this triggered a virtuos cycle of continuous improvements at all levels that has been dubbed "winning the hardware lottery" [2].
[1] K.-S. Oh, K. Jung, GPU implementation of neural networks.
[2] S. Hooker, The Hardware Lottery. https://arxiv.org/abs/2009.06489