Hacker News new | ask | show | jobs
by Culonavirus 1143 days ago
Its vast limitations in anything reasoning-based are indeed evident.
1 comments

GPT-4 is better at reasoning than 90% of humans. At least. I won't be surprised if GPT-5 is better than 100% of humans. I'm saying this in complete seriousness.
Google search is better than reasoning than most humans - in that if you search for an explanation of something then Google's first result is often correct, or one of the following ones.

GPT-4 will often come up with a solution to a problem, but only if it has learnt something similar (it's better than Google in some respects: it can extract and combine abstractions).

However, both need handholding by a human (supplying the initiative and directing around mistakes).

If GPT-4 can't intuit an answer then it just goes in circles. It can't reason its way through a novel problem. If you start questioning it then it's clear that it doesn't understand what it's doing.

It might be a stepping stone towards AGI, but I'm a bit bemused by anyone claiming that it has anything like the reasoning skills of a human. That is far from the impression I get, even though I find it a useful tool.

OK, you have a point. But, if you replace "GPT-4" with "an average human" in your response - it still makes sense.
When you ask a question of "an average human", they will reply with a sentence answering your question, or maybe asking for clarifications, etc. When you ask a question of GPT-4, it generates one token.

One. Token.

One token, that is not an answer to your question, that is not asking for clarifications, that is not trying to reason about your question, or how to answer it, because it's just one token.

One. Token.

For example, say you ask:

  "What is the sieve of Eratosthenes? Speak in chicken".
Then GPT-4 will answer with something like:

  Chicken
You don't see that because you don't interact with the GPT-4 model directly. You interact with ChatGPT, a "wrapper" around GPT-4, that takes your prompt, passes it to the model, then takes the token returned by the model and decides what to do with it. In the example above, ChatGPT will take the " Chicken" token generated by GPT-4 and append it to your question:

  What is the sieve of Eratosthenes? Speak in chicken. Chicken
Then it will send this new string back to the model, which will generate a new token:

  chicken
And then it will go like this:

  Iteration 2: What is the sieve of Eratosthenes? Speak in chicken. Chicken chicken

  ... 

  Iteration k: What is the sieve of Eratosthenes? Speak in chicken. Chicken chicken chicken chicken chicken chicken chicken chicken chicken chicken ...
At no point is GPT-4 trying to reason about your question, or try to answer your question, or do anything else than generate one. token. at a time. There's no thinking, no reasoning, no calculation, no logic, no deduction, no intelligence, no anything. It's only token, token, token. Chicken, chicken, chicken.

And do you know when the chickens stop? When GPT-4 generates a special and magickal token, called a stop-token (or a "stop sequence" in OpenAI docs). That's a token, not found in the training corpus, added to the end of every string during tokenisation. That's how ChatGPT knows to stop sending back your prompt + generated tokens, to the model. It can't look back to what GPT-4 has generated so far, because it doesn't understand any of that. Because it doesn't understand anything, and therefore cannot reason about your question, or realise it has answered it. It cannot do anything except a) ask GPT-4 to generate another token or b) stop asking for more tokens.

  "What is the sieve of Eratosthenes? Speak in chicken". Chicken, chicken, chicken, chicken, .... chicken, stop_token!
No more chickens.

And that's how GPT-4 explains what the Seive of Eratosthenes is, but in chicken.

So what you see as a user is like watching a movie where a plate of spaghetti is flying through the air, lands on a table cloth, the tablecloth lands on a table, the table lands on the floor, four chairs land around it and suddendly there's people in the chairs eating the spaghetti. It's not that someone has figured out how to reverse time: it's a recording, played backwards. It looks like things are going backwards, but they're not.

It looks like ChatGPT is answering your questions, but it's not. It looks like ChatGPT is interacting with you, but it's not. It's a magick trick.

Prompt: What is the sieve of Eratosthenes? Speak in chicken

Answer from GPT-4:

Sure, here's the explanation of the sieve of Eratosthenes explained in a simplified "chicken" style:

Imagine you are a chicken. You have lots of chicken friends. Let's say there are 100 of you. Now, you all stand in a line. Each chicken is numbered, from 2 up to 100. Chicken number 1 decided to stay at home because he's a bit special in this game.

Chicken number 2, a very important chicken, starts to crow. Every chicken in line after number 2 who has a number that is a multiple of 2 (so every second chicken from chicken number 2) gets scared and runs away.

Then chicken number 3, the next chicken in line who didn't run away, starts to crow. Every chicken in line after number 3 who has a number that is a multiple of 3 (so every third chicken from chicken number 3) also gets scared and runs away.

This continues with the next chicken in line who didn't run away, and so on. Each chicken scares away all the chickens whose numbers are multiples of their own number.

At the end, the chickens that are still in line, who didn't run away, are very special chickens. Their numbers are what humans call prime numbers. These are numbers that can only be divided by 1 and themselves.

This whole chicken game is like the Sieve of Eratosthenes. It's a way that humans have figured out to find prime numbers. You start from the lowest number (2) and eliminate multiples of each number. The numbers that are left over are the primes.

And that, in chicken speak, is the Sieve of Eratosthenes!

Bad prompt. Try "Only use the word chicken" instead of "Speak in chicken".
> There's no thinking, no reasoning, no calculation, no logic, no deduction, no intelligence, no anything. It's only token, token, token.

False. The neural network inside the transformer LLM contains a hierarchical semantic model, and has inferred some rules of reasoning from the training set. It can apply those rules to new input.

There are semantic layers above the "token token tokens".

Explore them here: https://openaipublic.blob.core.windows.net/neuron-explainer/...

What you're commenting here is simply repearting, without critical intent, the baseless claims connectionists have made about their systems for many decades. Similarlly those claims have been criticised but connectionsists simply ignore the criticisms and continue with the same old nonsense, as if nothing happened. For example, that ridiculous conceit that their systems have "neurons", or that the weights of functions in a neural net somehow represent semantic categories recognised by humans. These are all complete fantasies.

If you are not aware of the long history of debunking such fabrications, I suggest you start here:

Connectionism and Cognitive Architecture: A Critical Analysis

https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/pro...

Do you put yourself in the 10% or the 90%? I’m asking in complete seriousness.
Oh it's definitely better than me at reasoning. I'm the one asking it to explain things to me, not the other way around.
If you think it's better than you at reasoning then you cannot at all be confident in the truth of it's dialog.
I am not. I treat it as I'd treat any smart human being.
LLM's are not a "smart human being." They are predictive statistical models capable of producing results based on training data.

LLM's do not think.

LLM's are algorithms.

Ah ok. Here you use the word “explain” which implies more of a descriptive, reducing action rather than extrapolative and constructive. As in, it can explain what it has “read” (and it has obviously “read” far more than any human), but it can’t necessarily extrapolate beyond that or use that to find new truths. To me reasoning is more about the extrapolative, truth-finding process, ie “wisdom” from knowledge rather than just knowledge. But maybe my definition of “reasoning” isn’t quite right.

Edit: I probably should define reasoning as solely “deductive reasoning”, in which case, perhaps it is better than humans. But that seems like a premature claim. On the other hand, non-deductive reasoning, I have yet to see from it. I personally can’t imagine how it could do so reliably (from a human perspective) without real-world experiences and perceptions. I’m the sort that believes a true AGI would require a highly-perceptual, space-occupying organ. In other words it would have to be and “feel” embodied, in time and space, in order to perform other forms of reasoning.

Why don't you suggest an example we can run and see what it's capable of (compared to what I, or other humans, are capable of)?
(In case it was missed, I’ve added a relevant addendum to my previous comment.)

Not sure an example is needed because I agree it “explains” better than pretty much everyone. (From my mostly lay perspective) It essentially uses the prompt as an argument in a probabilistic analysis of its incredibly vast store of prior inputs to transform them into an output that at least superficially satisfies the prompter’s goals. This is cool and useful, to say the least. But this is only one kind of reasoning.

A machine without embodied perceptual experiences simply cannot reason to the full-extent of a human.

(It’s also worth remembering that the prompter (very likely) has far less knowledge of the domain of interest and far less skill with the language of communication, so the prompter is generally quite easily impressed regardless of the truth of the output. Nothing wrong with that necessarily, especially if it is usually accurate. But again, worth remembering.)

I intuit the same given the jumps we've seen from gpt 3.5 fo 4. The next jump will feature considerably more general intelegence than any one human being, though I think it will have certain weaknesses.
>GPT-4 is better at reasoning than 90% of humans.

Even for obviously nonsens that 90% of humans would recognice as such.

> GPT-4 is better at reasoning than 90% of humans. At least.

This makes as much sense as me asserting a binary search can tell one why they are looking for what they want.

That is to say, it cannot.