Hacker News new | ask | show | jobs
by simonh 1177 days ago
I see it’s knowledge structure as completely different from ours. For example all the GPT variants can give an explanation of how to do arithmetic, or even quite advanced mathematics. They can explain the step by step process. None of them, until quite recently, could actually do it though. The most recent variants can to some extent, but not because they can explain the process. The mechanisms implemented to do maths are completely independent of the mechanisms for explaining it, they are completely unrelated tasks for an LLM.

This is because LLMs have been trained on many maths text books and papers explaining maths theory and procedures, so they encode token sequence weightings well suited to generating such texts. That must mean it knows how to do maths, right? I mean it just explained the procedures very clearly, so obviously it can do maths. However maths problems and mathematical expressions are completely different classes of texts from explanatory texts, involving completely unrelated token sequence weightings.

In all but the latest GPT variants the token sequence weightings would generally get expressions kind of right, but didn’t understand the significance of numbers hardly at all, so the numeric component of response texts would be basically just made up on the spot. The limitations of probabilistic best guess token sequences just doesn’t work for formal logical structures like maths, so the training of the latest generation models has probably had to be heavily tuned to improve in this area.

The implications of this are obvious in the case of mathematics, but it provides a valuable insight into other types of answer. Just because it can explain something, we need to be very careful concluding what that implies it does or doesn’t “know”. Knowledge for us and for LLMs mean completely different things. I’m not at all saying it doesn’t know things, it just knows them in a radically different way from us, that we find hard to understand and reason about, and that can be incredibly counterintuitive to us. If a human can explain how to do something that means they know how to actually do it, but that’s just not at all necessarily so for an LLM. This was blatantly obvious and easy to demonstrate in earlier LLM generations, but is becoming less obvious as workarounds, tuned training texts and calls to specialist models or external APIs are used behind the scenes to close the capability gap between explanatory and practical ability.

This is just one example illustrating one of the ways they are fundamentally different from us, but all the cases of LLMs being tricked into generating absurd or weird responses also illustrate many of the other ways their knowledge and reasoning architecture varies enormously from ours. These things are incredibly capable, but are essentially very smart and sophisticated, but also very alien intelligences.

1 comments

You’re right of course. The LLM is a calculator continuously predicting a best-fitting next token based on the data it was trained on.

If its outputs resemble human reasoning, it’s because the encoding and training process managed to capture those patterns and use them to simulate fitting text. There is no real reasoning happening or second-order thought, other than a simulation of that happening through the mimicry of human writing.

LLMs can’t be prompted to perform actual reasoning, but they can be told to generate “thoughts” about what they’re doing that bring out more nuanced detail when they give their answers. This isn’t any more magical than writing out a more thoughtful prompt to get a conditioned answer, it’s just getting the LLM to flesh out the prompt engineering for you in the general direction you want it to go.

That seems rather fundamental to me, the idea that with some generic prompting the model tries to fit what it thinks reasoning looks like and can then take advantage of the additional context that would others be buried too deep to influence its answer.

I suspect that prompting the model to explore “thought” asks it to go down paths of linguistic connections that are related to the topic but not immediately connected to the answer in a way that would immediately influence the top predictions. Bringing summaries of those connections into the token context is a kind of zero-shot training on their relevancy to forming an answer.

To me this is less “reasoning” and more suggestive of the idea that some of the heuristics for data retrieval and question answering we collectively refer to as reasoning have broader applications.