|
I'd say this is more related to the observation that LLMs aren't going to be good at math. (As the article says, their current performance is surprising enough as it is but I agree that it seems unlikely that just making bigger and bigger LLMs is going to get substantially better at even arithmetic, to say nothing of higher math.) They have a decent understanding of "X before Y" as a textual phrase, but I think it would be hard for them do very much further logic based on that temporal logic because it lacks the representation for it, as it lacks the representation suitable for math. I expect if you asked "Did $FAMOUS_EVENT happen before $OTHER_FAMOUS_EVENT" it would do OK, just as "What is $FAMOUS_NUMBER plus $FAMOUS_NUMBER?" does OK, but as you get more obscure it will fall down badly on tasks that humans would generally do OK at. Though, no, humans are not perfect at this by any means either. It is important to remember that what this entire technology boils down to is "what word is most likely to follow the content up to this point?", iterated. What that can do is impressive, no question, but at the same time, if you can try to imagine interacting with the world through that one and only tool, you may be able to better understand the limitations of this technology too. There are some tasks that just can't be performed that way. (You'll have a hard time doing so, though. It is very hard to think in that manner. As a human I really tend to think in a bare minimum of sentences at a time, which I then serialize into words. Trying to imagine operating in terms of "OK, what's the next word?" "OK, what's the next word?" "OK, what's the next word?" with no forward planning beyond what is implied by your choice of this particular word is not something that comes even remotely naturally to us.) When this tech answers the question "Did $FAMOUS_EVENT happen before $OTHER_FAMOUS_EVENT?", it is not thinking, OK, this event happened in 1876 and the other event happened in 1986, so, yes, it's before. It is thinking "What is the most likely next word after '... $OTHER_FAMOUS_EVENT?" "What is the next most likely word after that?" and so on. For famous events it is reasonably likely to get them right because the training data has relationships for the famous events. It might even make mistakes in a very human manner. But it's not doing temporal logic, because it can't. There's nowhere for "temporal logic" to be taking place. |