Hacker News new | ask | show | jobs
by pawelmurias 751 days ago
LLMs are idiots. They can't reason properly and only parrot stuff

https://chatgpt.com/share/dcb4ff4e-e8a2-463b-86ec-9caf10b6e6...

Sometimes they get the answer right to something really complex because it fits a pattern, but sometimes they answer with something really really stupid.

1 comments

Why are so many people so insistent on saying this?

I’m guessing you are in denial that we can make a simulated reasoning machine?

People keep saying it because that's literally how LLMs work. They run Montecarlo sampling over a very impressive latent linguistic space. These models are not fundamentally different than the Markov chains of yore except that these latent representations are incredibly powerful.

We haven't even started to approach the largest problem which is moving beyond what is essentially a greedy token level search of this linguistic space. That is, we can't really pick an output that maximized the likelihood of the entire sequence, rather we're simply maximizing the likelihood of each part of the sequence.

LLMs are not reasoning machines. They are basically semantic compression machines with a build in search feature.

> LLMs are not reasoning machines. They are basically semantic compression machines with a build in search feature.

This is just a god of the gaps argument. Understanding is a form of semantic compression. So you're saying we have a system that can learn and construct a database of semantic information, then search it and compose novel, structured and coherent semantic content to respond to an a priori unknown prompt. Sounds like a form of reasoning to me. Maybe it's a limited deeply flawed type of reasoning, not that human reason is perfect, but that doesn't support your contention that it's not reasoning at all.

It’s basically an argument that boils down to “it’s not because I don’t like it”
I bite the bullet on the god of the gaps
The best compression is some form of understanding
The best compression relies on understanding. What LLM is is mostly data how humans use words. We understand how to make this data (which is a compression of human text) and use it (generate something). AKA it’s “production rules”, but statistical.

The only issue is ambiguity. What can be generated strongly depends on the order of the tokens. A slight variation can change the meaning and the result is worthless. Understanding is the guardrail against meaningless statement and LLMs lack it.

You seem to entirely miss how attention layers work...
That's a fascinating insight and it sound so true!

Can you compress for me Van Gogh's Starry Night, please? I'd like to send a copy to my dear old mother who has never seen it. Please make sure when she decompresses the picture she misses none of the exquisite detail in that famous painting.

Okay yes so not really having an artists vocabulary I couldn't compress it as well as someone who has a better understanding of Starry Night. An artist that understands what makes Starry Night great could create a work that evokes similar feelings and emotions. I know this because Van Gogh created many similar works playing with the same techniques, colors, and subjects such as Cypresses in Starry Night and Starry Night over the Rhone. He was clearly working from a concise set of ideas and techniques which I would argue is understanding/compression.
Fine, but we were talking about compression, not about imitation, or inspiration, and not about creating "a work that evokes similar feelings and emotions". If I compress an image, what I get when I decompress it is that image, not "feelings and emotions", yes? In fact, that's kind of the whole point: I can send an image over the web and the receiver can form their own feelings and emotions, without having to rely on mine.
I don't think you can evaluate if an LLM is reasoning by looking purely at the mechanics, because if we looked inside a human brain we wouldn't be able to conclude that it can reason either (our test is 'i think therefore i am', not all these neurons look like they are plugged together in such a way that it enables reason).
Exactly right and well said.
This type of self affirmation has a quality of denial.

Also the above description is reductive to the point of "Cars can't get you anywhere because they aren't horses."

Beam search.

Sophisticated folks aren't doing simplistic/stupid decoding.

Gotta go beyond LLMs 101 to see what's actually happening. Even in training folks are building models which predict several tokens ahead.

It is hard to trust any output from a machine that is confidently wrong so frequently. You need to already be knowledgable in a topic (or at least have a well attuned BS detector) to know if it is giving you correct responses. It can be a time saver and assistant in getting work done where you are already a subject matter expert, but it needs to get better to remove the human from the loop.
No it is because supervised and self supervised learning happen to produce reasoning as a byproduct. For some reason people think that telling a model to recite a trillion tokens somehow will improve it beyond the recitation of those tokens. I mean, in theory you can select the training data so that it will learn what you want, but then again you are limited to what you taught it directly.

The problem is that these models weren't trained to reason. For the task of reasoning, they are overfitting to the dataset. If you want a machine to reason, then build and train it to reason, don't train it to do something else and then expect it to do the thing you didn't train it for.

> The problem is that these models weren't trained to reason.

Except they kind of were. Specifically, they were trained to predict next tokens based on text input, with the optimization function being, does the result make sense to a human?. That's embedded in the training data: it's not random strings, it's output of human reasoning, both basic and sophisticated. That's also what RLHF selects for later on. The models are indeed forced to simulate reasoning.

> don't train it to do something else and then expect it to do the thing you didn't train it for.

That's the difference between AGI and specialized AI - AGI is supposed to do the things you didn't train it to do.

I think people don’t recognize it’s currently doing single turn reasoning and demonstrating the building blocks of real time reasoning with continuous input.

If we tested humans on first thought questions and answers in 5 seconds or less on half the problems we did on LLMs — we might prove humans can’t reason as well

Maybe people have different experiences with the products than you.

A simulated reasoning machine being possible does not mean that current LLMs are simulated thinking machines.

Maybe you should try asking chatgpt for advice on how to understand other people’s perspectives: https://chatgpt.com/share/3d63c646-859b-4903-897e-9a0cb7e47b...

This is such a weirdly preachy and belligerent take.

Obviously that was implied in my statement. Dude we aren’t all 4 year olds that need a self righteous lesson

Weird to accuse a response of being belligerent when your initial comment stated that people who disagreed with you are in denial.

What was implied by your statement? That you don’t understand other people’s perspectives?

Because they understand how LLMs work. It's not reasoning. It's not simulating reasoning.
There's some irony in seeing people parrot the argument that LLMs are parrots.
Also making errors in reasoning while saying LLM errors prove it can’t reason.
> I’m guessing you are in denial that we can make a simulated reasoning machine?

some people actually try, and see that LLMs are not there yet