|
It's a pretty big deal, and there's a big difference between a Markov chain and a deep language model - the Markov chain will quickly converge, while the language model can scale with the data. But the way these models are talked about is misleading. They don't "answer questions", "translate", "explain jokes", or anything of that sort. They predict missing words. Since the network is so large, and the dataset has so many examples, it can scale up the method of
1) Find a part of the network which encodes training data that is most similar to the prompt
2) Put the words from the prompt in place of the corresponding words in the encoding of the training data i.e. pattern matching. So if it has seen a similar question to the one given in the prompt (and given that it's trained on most of the internet, it will find thousands of uncannily similar questions), it will produce a convincing answer. How is that different from a human answering questions? A human uses pattern matching as part of the process, sure. But they also use, well, all the other abilities that together make up intelligence. They connect that meaningless symbols of the sentence to the mental representations that model the world - the ones pertaining to whatever the question is about. If I ask a librarian "What is the path integral formulation of quantum mechanics?", and they come back with a textbook and proceed to read the answer from page 345, my reaction is not "Wow, you must be a genius physicist!", it's "Wow, you sure know where to find the right book for any question!". In the same way, I'm impressed with GPT for being a nifty search engine, but then again, Google search does a pretty good job of that already. |
In my experience with language models, what they do cannot be reduced to madlibs. But that's obviously not an argument I can prove to you.
Can we agree that if the model can explain structurally novel jokes, then it must have some measure of true understanding?