Hacker News new | ask | show | jobs
by cristiancavalli 498 days ago
No I can “prove” it — look at any number of cases where LLMs can’t even do basic value comparisons despite being claimed as super intelligent. You can try and say well that’s a limitation of the technology and then I would reply — yes and that’s why I would say it’s not reasoning according the original human definition. Also you have yet to produce any evidence of reasoning and claiming you can over and over again doesn’t add to your arguments substance. I would be interested in your proof that some answer can’t be pattern matched too — at this point I wonder if we could create an non conscious “intelligence” that if large enough would be mostly able to describe anything known to us along some line of probability we couldn’t compute with our brain architecture and it could be close to 99.99999% right. Even if we had this theoretical probability-based super intelligence it still wouldn’t be “reasoning” but could be more “intelligent” than us.

I’m also not entirely convinced we can’t arrive at a reasoning system via probability only (a really cool thought experiment) but these systems do not meet the consistency/intelligence bar for me to believe this currently.

1 comments

LLMs can reason they just don’t always reason.

That’s the claim everyone makes. That is a human definition if it reasoned one time correctly. That is the colloquial definition.

Someone who has brain damage can reason correctly on certain subjects and incorrectly on other subjects. This is an immensely reasonable definition. I’m not being pedantic or out of line here when I say LLMs can reason while using this definition.

Nobody is making the claim that LLMs reason like humans or are human or reason perfectly every time. Again the claim is: LLMs are capable of reasoning.

No reasoning is about applying rules of logic consistently, so if you only do it some of the time, that's not reasoning.

If I roll a die and only _sometimes_ it returns the correct answer to a basic arithmetic question, this is the exact reason why we don't say a die is doing arithmetic.

Even worse in the case of LLMs, where it's not caused by pure chance, but also training bias and hallucinations.

You can claim nobody knows the exact definition of reasoning, maybe there are some edges which aren't clearly defined because they're part of Philosophy, but applying rules of logic consistently is not something you just don't always do and still call it reasoning.

Also, LLMs are generally incapable of saying they don't know something, cannot know something, can't do something, etc. They would rather try and hallucinate. When it does that, it's not reasoning. And you also can't explain to an LLM how to figure out it doesn't know something, and then actually say it doesn't know and not make stuff up. If it was capable of reasoning you should be able to convince it using _reason_, to do exactly that.

However, you

I still think the jury is out on this given that they seem to fail on obvious things which are trivially reasoned about by humans. Perhaps they reason differently at which point I would need to understand how this reasoning is different from a humans reasoning (perhaps biological reasoning more generally?) and then I would want to consider whether one ought to call it reasoning given its differences (if there are any at the time of sampling). I understand your claim I’m just not buying it based on the current evidence and my interacting with these supposed “super intelligences” every day. I still find these tools valuable, just unable to “reason” about a concept which makes me think, as powerful and meaning filled as language is, our assumption of reasoning might just be a trick of our brain reasoning through a more tightly controlled stochastic space and us projecting the concept of reasoning onto a system. I see the COT models contort and twist language in a simulacrum of “reasoning” but any high school English teacher can tell you there is a lot of text written that appears to logically reason but doesn’t actually do anything of the sort once read with the requisite knowledge in the subject matter.
They can fail at reasoning. But they can demonstrably succeed to.

So the the statement that they CAN reason is demonstrably true.

Ok if given a prompt where the solution can only be arrived at by reasoning and the LLM gets to the solution for that single prompt, then how can you say it can't reason?

Given your set of theoreticals then I would concede, yes the model is reasoning. At that point, though, the world would probably be far more concerned with your finding of a question that can only be met via reasoning and would be uninfluenced or paralleled by any empirical phenomenon including written knowledge as a medium of transference. The core issue I see here is you being able to prove that the model is actually reasoning in a concrete way that isn’t just a simulacrum like the Apple researchers et al. theorize it to be.

If you do find this question answer pair then it would be a massive breakthrough for science and philosophy more generally.

You say “demonstrably” but I still do not see a demonstration of these reasoning abilities that is not subject to the aforementioned criticisms.

This looks neat but I don’t think it meets the standard for “reasoning only.” (Still not sure how you would prove that one) furthermore this looks to be fairly generalizable in pattern+form to other grid problems so i don’t think it also meets the bar for “not being in the training data.” We known these models can generalize somewhat based upon their training but not consistently and certainly not consistently well. Again I’m not making the claim that responding to a novel prompt is a sign of reasoning as other have pointed out a calculator can do that too.

Your quote: “This is a unique problem I came up with. It’s a variation on counting islands.” You then say: “ as I came up with it so no variation of it really exists anywhere else.”

So not sure what to take away from your text but I do think this is a variation of a well-known problem type so I would be pretty amazed if there was something very close to this in the training data. Given it’s an interview question and those are written about ad-nauseum I’m not surprised then that it was able to generalize to the provided case. The COT researchers did see the ability to generalize in some cases just not necessarily actually use the COT tokens to reason and/or failed on generalizing on variations which they thought it should have given its ability to generalize in others and the postulation that it was using reasoning and not just a larger corpus to pattern match with.

Just say it : llm are random machine. Even a broken clock is right twice a day.