Do you have a citation for that? It is undergrad level maths and I struggle to believe the technique is news to the AI people. The mathematicians would have known about it in theory for centuries.
1) Schmidhuber does NOT claim to have invented it. He even provides lots of really old references. You know it's old when he didn't invent it, at least in his own mind.
2) even with his generous attributions, "the first application of backpropagation to neural networks" is from 1980.
3) "LeCun et al. (1989) applied backpropagation to Fukushima’s convolutional architecture (1979)".
In other words, the chain rule is really old but figuring out how to use that to adjust weights in neural nets was surprisingly unobvious. It was even more unobvious that that was a good way of adjusting weights.
I've glanced through that material, and I still think it is all obvious. It just wouldn't have worked any earlier than when the results started coming out. If they'd tried these techniques in the 50s it'd have been computationally impossible. If they try them in 2020 they're computationally trivial.
These results would have all started to happen at about the time the cost of computation was within reach of the researcher's budgets. The "theoretical" breakthroughs are of the form "we can implement this technique from the 60s and get good results". Which is impressive, but it does not represent breakthroughs of knowledge as much as incremental improvements in hardware crossing key thresholds. The breakthrough is detecting that hardware can make something work now.
It seems to me that we had to wait until decent memory sizes and decent fp performance was a lot cheaper and therefore much more accessible => much easier to do experiments without having to justify them to higher-ups => somebody figured out 1) how to do backpropagation on neural nets and 2) that it was useful.
In other words, it wasn't obvious at all. It required experimentation.
It would have been practically useful from the 60's (for small neural nets and high-value problems) and 70's (not so small neural nets or lower-value problems) if somebody had figured out how to do it and that it was a useful thing to do.
We might note that it wouldn't have been useful if it was discovered much earlier. There is a minimum amount of computation required to get good results out of neural nets that we've only been crossing relatively recently. From some perspectives the technique could be argued as the most computationally intensive approach to problem solving humans have employed to date.