Hacker News new | ask | show | jobs
by vidarh 980 days ago
We can inspect these models, and repeated attempts at that shows that they e.g. have built internal models from learning that generalise, so I'm not sure what you think that tells you that justifies arguing they can't reason. Beyond that, trying to reverse the specific "reasoning" that leads to a given output is generally hard, and certainly no attempts at doing so have produced any evidence that the way they work is conclusively not reasoning as far as I am aware.

> We know as a fact that they are static and not generating new knowledge from input.

We know the models in themselves if not wired up to be finetuned during operation are static. This is not a property of an LLM but of the environment they run in. We know the second is wrong - they produce output that often contain new knowledge. That this output needs to be fed back in as context to act as short term memory in common setups like ChatGPT where finetuning does not happen automatically during operation does not mean it is not produced.

> I don't understand why you keep talking about human capabilities - your guesses about what humans may or may not be able to do are irrelevant. You can hold whatever opinion you like about my ability to reason, but I'd suggest using less wishful thinking with regards LLMs.

I keep talking about human capabilities because I presume that you would argue that there are humans of normal intellect incapable of reasoning.

To be able to assert with any confidence that LLMs do not reason you need a definition of reasoning that LLMs can not (not just do not in a single test) meet, but that won't result in claiming there are a lot of people around who can't reason.

Am I wrong? Do you believe there are humans that do not clear the bar and are unable to reason?

To me, a "chat-style" setup of an LLM that provides a feedback loop and memory through context, albeit a small one, clears the bar you set for reasoning with ease and is able to extrapolate and reason about e.g. software at a level that sometimes - but certainly not always, nor consistently - exceeds what I see from experienced developers.

That it also fails does not alter that part - humans fail to apply reasoning all the time. Depressingly often, if anything.

> You can hold whatever opinion you like about my ability to reason, but I'd suggest using less wishful thinking with regards LLMs.

Nothing I've said here is wishful thinking. All I've done is point to direct experience combined with pointing out that there is no evidence for the claim that they are not able to reason, and that the arguments set forward for that claim here have not been logically sound.

I will say that in my opinion they can reason by my subjective idea of what reasoning means, without necessarily being able to precisely define that, but I also will not argue it objectively true that they can reason as that is equally problematic without first defining reasoning in an objectively measurable way (needed, because as you can tell, we disagree on whether they clear your bar - to me your bar the way you described it is trivial for them to meet)

To me, a whole lot of the discussions in this thread are evidence of how exceedingly low the bar for what is reason needs to be for us not to have to exclude a whole lot of people as unable to reason. Humans get hung up on ideas and refuse to budge all the time - I do it too, all the time - and refuse to take in new information as a result, and fail to generalise, and keep making flawed arguments as a result all the time. Yet we would generally not claim that this means people are unable to reason even when it gets to the level where we might think that a person does not reason in that specific case.

That in itself does not mean they can reason, but to me the typical arguments claiming they can't reason tends to be exceedingly poorly reasoned.

Pointing out the static nature of the models gets closer, and is perhaps the best argument against their reasoning ability I've heard, but is weak both because chat-style models effectively use context as short-term memory and so you need to assess model+context, and because it's not a qualitative limitation of the model architecture but of the sandbox we've put it in where we don't continue fine-tuning from the conversations in real-time. Yet even so, there have been humans without ability to form long-term memory, and I doubt you'd argue they were unable to reason.

> They're very useful, but not for reasoning.

To me, they have been very useful for their reasoning ability in a long range of cases. It's hit and miss. LLMs are extremely dumb in some areas, and do well in others. Using them blindly and just assuming they'll do well in a given test will not work. Hence the point that it is not logically sound to argue that they are unable to reason because they failed to generalise in a specific test, because if so we would then need to conclude that most humans (myself included) can't reason because we all fail to do so on a regular basis.

My example of extrapolating from a simple description of a (non-existing) programming language to being able to translate programs into it or explain how one works and reason about the design tradeoffs, or even symbolically "execute" it and tell me what the output would be, for example, is one I know from first han experience using it as a means to assess analytical capabilities of even quite experienced developers is something a lot of really smart people struggle with, but where when I've experimented with GPT4 have gotten good results.