|
Hinton didn’t invent back prop. > Explicit, efficient error backpropagation (BP) in arbitrary, discrete, possibly sparsely connected, NN-like networks apparently was first described in a 1970 master's thesis (Linnainmaa, 1970, 1976), albeit without reference to NNs. BP is also known as the reverse mode of automatic differentiation (e.g., Griewank, 2012), where the costs of forward activation spreading essentially equal the costs of backward derivative calculation. See early BP FORTRAN code (Linnainmaa, 1970) and closely related work (Ostrovskii et al., 1971). > BP was soon explicitly used to minimize cost functions by adapting control parameters (weights) (Dreyfus, 1973). This was followed by some preliminary, NN-specific discussion (Werbos, 1974, section 5.5.1), and a computer program for automatically deriving and implementing BP for any given differentiable system (Speelpenning, 1980). > To my knowledge, the first NN-specific application of efficient BP as above was described by Werbos (1982). Related work was published several years later (Parker, 1985; LeCun, 1985). When computers had become 10,000 times faster per Dollar and much more accessible than those of 1960-1970, a paper of 1986 significantly contributed to the popularisation of BP for NNs (Rumelhart et al., 1986), experimentally demonstrating the emergence of useful internal representations in hidden layers. https://people.idsia.ch/~juergen/who-invented-backpropagatio... Hinton wasn’t the first to use NNs for language models either. That was Bengio. |
[1]Learning representations by back-propagating errors