| Ignoring the “spits out training data” bit which is at best misleading, it’s interesting that you use the word “abstract” here. I recently followed Karpathy’s GPT-from-scratch tutorial and was fascinated with how clearly you could see the models improving. With no training, the model spits out uniformly random text. With a bit of training, the model starts generating gibberish. With further training, the model starts recognizing simple character patterns, like putting a consonant after a vowel. Then it learns syllables, and then words, and then sentences. With enough training (and data and parameters, of course) you eventually yield a model like GPT-4 that can write better code than many programmers. It’s not always that clear cut, but you can clearly observe it moving up the chain of abstraction as the training loss decreases. What happens when you go even bigger than GPT-4? We have every reason to believe that the models will be able to think more abstractly. Your “never gonna work” comment flies in the face of exponential curve we find ourselves on. |