Hacker News new | ask | show | jobs
by pncnmnp 300 days ago
I have a question that's bothered me for quite a while now. In 2018, Michael Jordan (UC Berkeley) wrote a rather interesting essay - https://medium.com/@mijordan3/artificial-intelligence-the-re... (Artificial Intelligence — The Revolution Hasn’t Happened Yet)

In it, he stated the following:

> Indeed, the famous “backpropagation” algorithm that was rediscovered by David Rumelhart in the early 1980s, and which is now viewed as being at the core of the so-called “AI revolution,” first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon.

I was wondering whether anyone could point me to the paper or piece of work he was referring to. There are many citations in Schmidhuber’s piece, and in my previous attempts I've gotten lost in papers.

7 comments

Perhaps this:

Henry J. Kelley (1960). Gradient Theory of Optimal Flight Paths.

[1] https://claude.ai/public/artifacts/8e1dfe2b-69b0-4f2c-88f5-0...

Thanks! This might be it. I looked up Henry J. Kelley on Wikipedia, and in the notes I found a citation to this paper from Stuart Dreyfus (Berkeley): "Artificial Neural Networks, Back Propagation and the Kelley-Bryson Gradient Procedure" (https://gwern.net/doc/ai/nn/1990-dreyfus.pdf).

I am still going through it, but the latter is quite interesting!

Count another in the win column for the USA's heavy investment into basic sciences during the space race.

So sad to see the current state. Hopefully we can turn it around.

It is in Applied Optimal Control by Bryson and Ho (1969). Yann LeCun acknowledges this in his 1989 paper on backpropagation:https://new.math.uiuc.edu/MathMLseminar/seminarPapers/LeCunB....

> "Since his first work on the subject, the author has found that A. Bryson and Y.-C. Ho [Bryson and Ho, 1969] described the backpropagation algorithm using Lagrange formalism. Although their description was, of course, within the framework of optimal control rather than machine learning, the resulting procedure is identical to backpropagation."

See Widnall's overview here which discusses some of the ground that crosses over with what has come to be known as backpropagation:

The Minimum-Time Thrust-Vector Control Law in the Apollo Lunar-Module Autopilot (1970)

https://www.sciencedirect.com/science/article/pii/S147466701...

Apologies - I should have been clear. I was not referring to Rumelhart et al., but to pieces of work that point to "optimizing the thrusts of the Apollo spaceships" using backprop.
Kelley 1960 (the gradient/adjoint flight‑path paper) https://perceptrondemo.com

AIAA 65‑701 (1965) “optimum thrust programming” for lunar transfers via steepest descent (Apollo‑era) https://arc.aiaa.org/doi/abs/10.2514/6.1965-701

Meditch 1964 (optimal thrust programming for lunar landing) https://openmdao.github.io/dymos/examples/moon_landing/moon_...

Smith 1967 & Colunga 1970 (explicit Apollo‑type trajectory/re‑entry optimization using adjoint gradients) https://ntrs.nasa.gov/citations/19670015714

One thing AI has been great for, recently, has been search for obscure or indirect references like this, that might be one step removed from any specific thing you're searching for, or if you have a tip-of-the-tongue search where you might have forgotten a phrase, or know you're using the wrong wording.

It's cool that you can trace the work of these rocket scientists all the way to the state of the art AI.

I don't know if there is a particular paper exactly, but Ben Recht has a discussion of the relationship between techniques in optimal control that became prominent in the 60's, and backpropagation:

https://archives.argmin.net/2016/05/18/mates-of-costate/

Rumelhart et al wrote "Parallel Distributed Processing"; there's a chapter where he proves that the backprop algorithm maximizes "harmony", which is simply a different formulation of error minimization.

I remember reading this book enthusiastically back in the mid 90s. I don't recall struggling with the proof, it was fairly straightforward. (I was in senior high school year at the time.)

They're probably talking about Kalman Filters (1961) and LMS filters (1960).
To be fair, any multivariable regulator or filter (estimator) that has a quadratic component (LQR/LQE) will naturally yield a solution similar to backpropagation when an iterative algorithm is used to optimize its cost or error function through a differentiable tangent space.
So yeah, this was what I was thinking for a while. What about a more nonlinear estimator? Intuitively seems similar to me.
I believe the reason it works in nonlinear cases is that the derivative is “naturally linear” (to calculate the derivative, you are considering ever smaller regions where the cost function is approximately linear - exactly “how nonlinear” the cost function is elsewhere doesn’t play a role).
that makes a lot of sense actually. thank you.
> ... first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon.

I think "its" refers to control theory, not backpropagation.