| Watch Karpathy's 'Zero to Hero' videos on YouTube. If you want a historical perspective, which is very worthwhile, start by reading about the mid-century work of McCullough and Pitts, and Minsky, Papert and their colleagues at MIT CSAIL after that. There will be a dry spell after Minsky and Papert because of their conclusion that the OG neural-network topology that everyone was familiar with, the so-called "perceptron", was a dead end. That conclusion was premature to say the least, but in any event the hardware and training techniques weren't available to support any serious progress. Adding hidden layers and nonlinear activation functions to the perceptron network seemed promising, in that they worked around some of Minsky's technical objections. The multi-layer perceptron was now a "universal approximator" capable of modeling any linear or nonlinear function. In retrospect that should have been considered a bigger deal than it was, but the MLP was still a PIA to train, and it didn't seem very useful at the scales achievable in hardware at the time. Anything a neural net could do, specialized code could usually do better and cheaper. Then, in the circa-2010 timeframe, AlexNet dusted off some of the older ideas and used them to win image-recognition benchmark competitions, not by a small margin but by blowing everybody else into the weeds. That brought the multi-layer perceptron back into vogue, and almost everything that has happened since can be traced back to that work. The Karpathy videos are the best intro to the MLP concept I've run across. Understanding the MLP is the key prereq if you want to understand current-gen AI from first principles. |