| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bastien2 709 days ago

You can't "enhance" from zero. LLMs by design are not capable of reason.

We can observe LLM-like behaviour in humans: all those reactionaries who just parrot whatever catchphrases mass media programmed into them. LLMs are just the computer version of that uncle who thinks Fox News is true and is the reason your nieces have to wear long pants at family gatherings.

He doesn't understand the catchphrases he parrots any more than the chatbots do.

Actual AI will require a kind of modelling that as yet does not exist.

6 comments

nl 709 days ago

> LLMs by design are not capable of reason.

This isn't true.

A deep neural network certainly can emulate the logical functions we think of as "reasoning" (ie, AND/OR/XOR functions).

See for example:

https://cprimozic.net/blog/boolean-logic-with-neural-network...

https://www.cs.toronto.edu/~axgao/cs486686_f21/lecture_notes...

https://towardsdatascience.com/emulating-logical-gates-with-...

https://medium.com/@stanleydukor/neural-representation-of-an...

amenhotep 709 days ago

What an odd comment. Would you assert also that an 8080 is "capable of reason"?

nl 709 days ago

Usually when people are saying "LLMs can't reason" they are claiming they are unable to do logical inference (although the claims are often quite hard to pin down to something specific).

Yes, an 8080 is capable of reasoning. Prolog runs well, see for example: https://medium.com/@kenichisasagawa/exploring-the-wonders-of...

Kerb_ 709 days ago

I would say integrated circuits in general are not incapable of reason by design, even if some examples may be. Somehow a bunch of meat and fat is capable of reason, even if my steak isn't.

nl 708 days ago

There’s a lot of negated negatives in that sentence.

One might say parsing it is a good example of logical inference which is what I think most people mean when they say “reasoning”.

belter 709 days ago

> LLMs by design are not capable of reason.

It is not as clear cut. The argument being, that the patterns they learn in text encodes several layers of abstraction, one of them being some reasoning, as it is encoded in the discourse.

wizzwizz4 709 days ago

They are capable of picking up incredibly crude, noisy versions of first-order symbolic reasoning, and specific, commonly-used arguments, and the context for when those might be applied.

Taken together and iterated, you get something vaguely resembling a reasoning algorithm, but your average schoolchild with an NLP library and regular expressions could make a better reasoning algorithm. (While I've been calling these "reasoning algorithms" for analogy's sake, they don't actually behave how we expect reasoning to behave.)

The language model predicts what reasoning might look like. But it doesn't actually do the reasoning, so (unless it has something capable of reasoning to guide it), it's not going to correctly derive conclusions from premises.

belter 709 days ago

Yes and No. I don't entirely disagree with you, but think about when you ask a model to explain step by step a conclusion. It is not doing the reasoning, but in a way abstracted and learned the pattern of doing the reasoning....So it is doing some type of reasoning....and sometimes producing the outcomes that are derived from actual reasoning...Even if defining "actual reasoning" is a whole new challenge.

jiggawatts 709 days ago

It took a long time for the limitations of LLMs to "click" for me in my brain.

Let's say there's a student reading 10 books on some topic. They notice that 9 of the books say "A is the answer" and just 1 book says "B is the answer". From that, the student will conclude and memorise that 90% of authors agree on A and that B is the 10% minority opinion.

If you train an LLM on the same data set, then the LLM will learn the same statistical distribution but won't be able to articulate it. In other words, if you start off with a generic intro blurb paragraph, it'll be able to complete it with the answer "A" 90% of the time and the answer "B" 10% of the time. What it won't be able to tell you is what the ratio is between A or B, and it won't "know" that B is the minority opinion.

Of course, if it reads a "meta review" text during training that talks about A-versus-B and the ratios between them, it'll learn that, but it can't itself arrive at this conclusion from simply having read the original sources!

THIS more than anything seems to be the limit of LLM intelligence: they're always one level behind humans when trained on the same inputs. They can learn only to reproduce the level of abstraction given to them, they can't infer the next level from the inputs.

I strongly suspect that this is solvable, but the trillion-dollar question is how? Certainly, vanilla GPT-syle networks cannot do this, something fundamentally new would be required at the training stage. Maybe there needs to be multiple passes over the input data, with secondary passes somehow "meta-training" the model. (If I knew the answer, I'd be rich!)

mewpmewp2 708 days ago

But if you give it those 10 books in the prompt, it will be able to spot that 1 of the authors disagreed.

wizzwizz4 708 days ago

In principle, yes, but empirically? They can't do this reliably, even if all the texts fit within the context window. (They can't even reliably answer the question "what does author X say about Y?" – which, I agree, they should be able to do in principle.)

john-tells-all 709 days ago

That's really insightful! Thanks.

bl0rg 709 days ago

Can you explain what it means to reason about something? Since you are so confident I'm guessing you'll find it easy to come up with a non-contrived definition that'll clearly include humans and future "actual AI" but exclude LLMs.

stoperaticless 709 days ago

Not the parent, but there are couple of things current AI lack:

- learning from single article /book with lasting effect (accumulation of knowledge)

- arithmetics without unexpected errors

- gauging reliability of information it’s printing

BTW. I doubt that you’ll get satisfactory definition of “able to reason” (or “conscious” or “alive” or “chair”). As they define more an end or direction of a spectrum, not an exact cut off point.

Current llms are impressive and useful, but given how often they spout nonsense, it is hard to put them into “able to reason” category.

mewpmewp2 709 days ago

> learning from single article /book with lasting effect (accumulation of knowledge)

If you mean without training the model, it can be done by using RAG, and allowing LLM to decide what to keep in mind as learnings to later come back to those. There are various techniques for RAG based memory/learning. It's a combination of querying the memory that is relevant to current goal, as well as method to keep most recent info in memory, as well as compressing, throwing out old info progressively, assigning importance levels to different "memories". Kind of like humans, honestly.

> arithmetics without unexpected errors

That's a bit handwavy, because humans make very many unexpected errors when doing arithmetics.

> gauging reliability of information it’s printing

Arguably most people also whatever they output, they are not very good at gauging the reliability. Also you can actually make it do that with proper prompting. You can make it debate itself, and finally let it decide the winning decision and confidence level.

Kiro 709 days ago

Go look at the top comment of this thread: https://news.ycombinator.com/item?id=40900482

That's the kind of stuff I want to see when opening a thread on HN, but most of the times we get shallow snark like yours instead. It's a shame.

p1esk 709 days ago

LLMs are trained to predict the next word in a sequence. As a result of this training they developed reasoning abilities. Currently these reasoning abilities are roughly at human level, but next gen models (gpt5) should be superior to humans at any reasoning tasks.

soist 709 days ago

How did you reach these conclusions and have you validated them by asking these superior artificial agents about whether you're correct or not?

cowsaymoo 709 days ago

The vocabulary used here doesn't have sufficient intrinsic dimension to partition the input into a low loss prediction. Improvement is promising with larger context or denser attention.