Some 15 years ago, textbooks taught that multi level perceptrons (fully connected feed forward network) with one hidden layer were sufficient because they were universal approximators. That thought kinda held back the field for a long time. Going against that dogma was so revolutionary that new paradigm was given its own name: deep learning.
Just because you can find some gotcha counterexample LLM's struggle with doesn't invalidate that we've come a very long way.
I think that was largely a misunderstanding. 20+ years ago I took an AI class that mentioned using multiple levels was useful for training neural networks. It also mentioned a 2 layer network was only a universal approximator given arbitrarily large numbers of nodes which again seems to be forgotten about.
Though the teacher worked in industry for a while which may have been relevant as we didn’t focus that much on theory.
PS: Deep learning was also more about improving computational power than some major theoretical advancement.
Nah keep hearing this, was doing multilayer in 90s, the problem was my machine didn’t even have a floating point unit, had to hand roll my own fixed point math and cpu was about 100mhz
Just because you can find some gotcha counterexample LLM's struggle with doesn't invalidate that we've come a very long way.