Hacker News new | ask | show | jobs
by peterfirefly 894 days ago
https://www.reddit.com/r/MachineLearning/comments/2xcyrl/i_a...

1) Schmidhuber does NOT claim to have invented it. He even provides lots of really old references. You know it's old when he didn't invent it, at least in his own mind.

2) even with his generous attributions, "the first application of backpropagation to neural networks" is from 1980.

3) "LeCun et al. (1989) applied backpropagation to Fukushima’s convolutional architecture (1979)".

In other words, the chain rule is really old but figuring out how to use that to adjust weights in neural nets was surprisingly unobvious. It was even more unobvious that that was a good way of adjusting weights.

1 comments

I've glanced through that material, and I still think it is all obvious. It just wouldn't have worked any earlier than when the results started coming out. If they'd tried these techniques in the 50s it'd have been computationally impossible. If they try them in 2020 they're computationally trivial.

These results would have all started to happen at about the time the cost of computation was within reach of the researcher's budgets. The "theoretical" breakthroughs are of the form "we can implement this technique from the 60s and get good results". Which is impressive, but it does not represent breakthroughs of knowledge as much as incremental improvements in hardware crossing key thresholds. The breakthrough is detecting that hardware can make something work now.

> It just wouldn't have worked any earlier than when the results started coming out.

It most certainly would. Not in the 50's, of course, but in the 60's and 70's.

http://www.roylongbottom.org.uk/whetstone.htm

Look at the MFLOPS columns.

It seems to me that we had to wait until decent memory sizes and decent fp performance was a lot cheaper and therefore much more accessible => much easier to do experiments without having to justify them to higher-ups => somebody figured out 1) how to do backpropagation on neural nets and 2) that it was useful.

In other words, it wasn't obvious at all. It required experimentation.

It would have been practically useful from the 60's (for small neural nets and high-value problems) and 70's (not so small neural nets or lower-value problems) if somebody had figured out how to do it and that it was a useful thing to do.