Hacker News new | ask | show | jobs
by nicklecompte 910 days ago
AI is not even close to having true logical reasoning, that's probably decades away. The issue is that cognitive scientists are clueless. Scientists have a good model for associative reasoning, which is the basis of modern neural networks, but we don't have a clue how abstract reasoning actually works. All birds and mammals have advanced abstract reasoning and are far more intelligent than GPT-4:

- birds and mammals are inherently able to count in almost any context because they understand what numbers actually mean; GPT-4 can only be trained to count in certain contexts. GPT-4 would be like a pigeon that could count apples, but not oranges, yet biological pigeons can count anything they can see, touch, or hear. There's a profound gap in true quantitative reasoning, even if GPT-4 can fake this reasoning on specific human math problems.

- Relatedly, birds and mammals are far faster at general pattern recognition than GPT-4, unless it has been trained to recognize that specific pattern.

- Birds and mammals can spontaneously form highly complex plans; GPT-4 struggles with even the simplest plans, unless it has been trained to execute that specific plan.

The "trained to do that specific thing" is what makes GPT-4 so much dumber than warm-blooded vertebrates. When we test the intelligence of an animal in a lab, we make sure to test them on a problem they've never seen before. If you test AI like you test an animal, AI looks incredibly stupid - because it is!

There was a devastating paper back in 2019[1] proving that Google's BERT model - which at the time was world-class at "logical reasoning" - was entirely cheating on its benchmarks. And another paper from this year[2] demonstrates that LLMs definitely don't have "emergent" abilities, AI researchers are just sloppy with stats. It is amazing how much bad science and wishful thinking has been accepted by the AI community.

[1] https://arxiv.org/abs/1907.07355

[2] https://arxiv.org/abs/2304.15004

4 comments

As a current researcher in the field I am perpetually annoyed by the overeagerness of AI research to make fantastical claims. Reading and extracting information from papers is a minefield, and we've learned to always at least A/B test the conclusions of any technique that is supposedly proven to be useful. Even foundational papers about basic concepts in LLMs, for example, can sometimes boil down to "this worked well on our cherrypicked tests"
Yeah, the "logical reasoning" in LLMs is mostly a marketing device to get products sold and papers published. One could hope that starting with reasoning instead of trying to get it "emerge" would do a better job. But if we have little idea of how abstract thinking actually works, this is a problem. What do you think about current logic-based AI approaches? Do they try to replicate the best ideas we've got from congnitive sciences, or trying to do their job for them?
> One could hope that starting with reasoning instead of trying to get it "emerge" would do a better job.

We did AI starting with reasoning (directly implementing rules of propositional logic) first, it is called expert systems.

It works very well for some things, but after some efforts to expand it with things like fuzzy logic it became pretty much accepted that we'd reached its limit.

You could hope that it would work better, but...

There's a distinction between "constrain the AI's output with logical rules to make it more reliable" and "build logical reasoning into the AI." The current strategies are trying to do the first task and I bet it'll lead to all sorts of cool technology. I strongly doubt these techniques will extend to the actual logical reasoning. Intuitively, it feels like throwing a bunch of logical rules onto AI is begging the question - I doubt bird/mammal brains actually have these logical rules baked in, I am sure it's far more sophisticated.

A trivial theorem in logic gives an example of what I mean:

If A then B <=> If (not B) then (not A)

This is really not how humans think - I don't believe we have a "contrapositive calculator" in our brain that takes arbitrary situations in and computes a contrapositive. This contrapositive theorem is a fact of the world that humans used logical thinking to understand, and which can be applied to formal logical computations that human brains aren't necessarily good at.

Specifically, I don't think non-human animals have "logical" thinking at all, they have causal thinking, and human logic is a consequence of us having exceptionally good understanding of causality. Logic is itself a special case of causality, formalized in a "generic" fashion by human language and used as a tool to help us think through tricky cases.

The contrapositive theorem takes a bit of thought for me to unwind - "so if B is not true then of course A can't be true" - but the way contrapositives are reflected in the real world takes no thought whatsoever, even if the examples are more algebraically complicated than A->B <=> (~B)->(~A):

- if the door is working and I have a key that can unlock the door, then if I can't unlock the door either I don't have the key or the door is broken. (AvB)->C <=> (~C)->(~A ^ ~B)

- if having gas implies my car can drive, then if my car can't drive I don't have gas - or possibly I was incorrect and my car is broken. (A->B <=> (~B)->(~A)) V (~)(A->B)

These cases are obvious to us because the brain has access to much fancier causal reasoning than what we can currently express in human language. For now, human language is stuck with "If a then not b" stuff. I don't think feeding this limited human language into a computer is going to burst past these limits. We need to figure out how bird/mammal brains actually model things causally.

> These cases are obvious to us because the brain has access to much fancier causal reasoning than what we can currently express in human language. For now, human language is stuck with "If a then not b" stuff.

I don't follow this. Didn't you just express these cases in human language? I understand that in reality we can "grasp" the meaning of a problem of not being able to open the door without expressing or thinking about it verbally, which would be redundant as there would be a lot to say (the key may be broken, the door may be held by someone on the other side, even if the key works we might be trying to push instead of pull, etc, etc.) and any person who has opened doors with keys would likely understand all of this. The problem is not that those things can't be expressed in human language, but the lack of ability to build good conceptual models of the world that encompasses all such knowledge and allows reasoning on it quickly.

I didn't mean the specific cases, I meant the underlying mechanism that our brain uses to reason about these cases. There is something deeper going on that allows us to build rigorous world models from very thin abstractions, which can be applied to a seemingly arbitrary range of problems. It's this rigorous world model which is absent in AI and not currently explained by cognitive science.

In this example, the overall world model is able to easily accommodate "broken door" "functioning door" "key" etc. and come to a specific conclusion about this problem. The specific conclusion can be easily expressed in human language. The world model itself can't.

Aren't animals trained to do all of those things through evolution? Similarly how GPT is trained.

Also how do you prove that GPT is worse at counting?

Because GPT can currently count both apples and oranges.

> Also how do you prove that GPT is worse at counting?

Back in June 2023 GPT-4 was dramatically worse at counting than a pigeon in the sense that it couldn't accurately tell the difference between sentences with 3 words and sentences with 5 words, whereas pigeons can count almost anything up to about 10. It also routinely failed "pick the shorter sentence" tests which I literally took from a test administered to mice. GPT simply doesn't understand what numbers are, whereas pigeons and mice have an intuitive understanding similar to toddlers. You don't need to teach kids what 3 means, you just need to teach them the human symbol for the concept of 3. GPT only has the human symbol and does not seem capable of understanding the concept.

In my testing GPT-4 consistently failed counting / pattern-recognition tests even if you used "chain-of-thought" prompting. As far as I could tell its only true understanding of numbers was "one, two, many." This seems reflected in real use cases, where GPT routinely (and hilariously) ignores commands to return 50 words/etc of output. GPT doesn't know what fifty means, it just knows what various documents that say "word count: 50" look like, and tries to imitate the tone.

Since transformer neural networks lack recursion I conjecture that GPT will never be able to understand a number larger than 2, even if in specific cases it can solve counting problems up to eleventy billion. This is what I mean by "counting apples, not oranges," its sense of counting is paper-thin and easily fooled by adversarial prompts. It is much harder to fool a mouse or a pigeon.

Many of the tests I ran back in April 2023 no longer work. I strongly suspect this is because OpenAI trained GPT to many of the tests that people were throwing at it, and not because GPT actually became "smarter." I stopped messing around with GPT specifically because OpenAI doesn't issue any release notes, making replicability impossible. Mistrial's 77B model was dramatically worse than even GPT-3 at counting, but I doubt they trained it to count. Not sure about LLaMa/etc.

When you are talking about "counting", do you mean the logical process of going "one", "two", "three"... or do you mean the ability to statistically estimate the amount of quantity by the amount of signal you are processing?

E.g. are pigeons actually "counting" as in the process how humans calculate to be accurate? Or are they just responding to the signal? Like similar to how a person could tell whether some sound is higher or lower pitch, but they wouldn't be able to actually numerically say the actual exact frequency.

Because to me pigeons are just similarly responding to the amount of "signal" they are receiving, not actually doing abstract reasoning.

And looking at the science studies, it also seems that they had to train pigeons to be able to count, they weren't able to do it out of the box.

But by the way, when you are criticising GPT's ability to count words in the sentences you are saying, that is quite odd to me. Because the input that GPT receives is actually tokens, not the words you give it.

So then imagine if someone asked you a question in English, and then translated it to hieroglyphs, and you didn't know English. Would you be able to count how many words were there in the original English?

So it seems weird to expect that GPT would be able to count in the first place.

But however if it later was taught how many words the combination of different tokens yielded to, it would be able to do that. So perhaps this is what was taught to it meanwhile yielding in that better ability to count words?

Thirdly GPT with Vision can count objects on an image very well, doesn't matter what the objects specifically are. Does it make mistakes? Sometimes, when objects are not clearly visible, but so would humans and pigeons.
The biological neural structures that encode behavior are “trained” through evolution, but even the most advanced animals rely mostly on conditioned (= learned during lifetime) reflexes, and not on the ones “hardcoded” evolutionary.

Certainly not much evolutionary “training” in the human brain has happened in the last 3000 years, yet advancement in our understanding of the world has been plentiful. But human thinking (including rationality, mathematics, etc.) is on a different level to even learned animalistic behavior. Some great apes were taught language and even showed basic abstract conceptual thinking, but were never able to reach the level of 3-5 years old human kids.

The problem with GPTs and other statistical models is that they can learn incredibly complex patterns in anything we can express as bytes, but not learn the simplest concepts of maths despite being trained on the whole corpus of mathematical texts available on the internet, while kids need classes that can be covered in a single textbook to understand them, and adults may need just a textbook for this.

The claim about those models being "statistical". Why wouldn't you consider human brain to be statistical or animal's brains as "statistical"?

Because in the end human brains as well as any brains it seems they could be thought of statistical results from long periods of training and producing output from input. Where am I wrong?

I assume by statistical you mean that the ending result of state of neurons can be represented as numbers and pathways leading through these as probabilities of going through a certain pathway - but it occurs to me that same is with human brain, no?

You are not wrong, but too abstract. Saying that brain is statistical and therefore can be represented with statistical model such as artificial neural net is like saying that brain is computational and therefore can be represented with a Turing-complete system, or that brain is physical and can be represented via physical simulation, etc.
> Aren't animals trained to do all of those things through evolution? Similarly how GPT is trained.

No. Animals evolved and are able to do those things. Evolution is not training, and evolution has approximately zero to do with how transformers work.

What's the difference here?

Both seem to have adaptive neural networks where those networks change as time goes on due to a reward - for animals, mutated genes being more likely to be given forward if the change was good. Over millions of generations it's statistically likely that more good genes that caused the neural networks to be in a state that is better able to solve problems within the environment get passed on, eventually resulting in an emerging intelligence. For training you similarly change the state of the neural network depending on whether the answer is good or bad.

Evolution operates on genes, which do not encode synaptic connections for one thing. The analogy you're making here is so stretched it's hard to begin to say what's wrong with it. Backpropagation and natural selection are about as different as two things can be. About the only thing you can say they have in common is that both can be modeled as optimization processes.

What's the difference between a star and a bonfire? Both use fuel and produce heat and light.

I mean the point was about LLMs not truly being problem solvers because they were trained to do so as opposed to having been evolved through evolution. I'm looking for what the difference is specifically within that dimension. Biological pigeons had their own process of evolution how they reached to have the type of neural networks and systems in themselves that gave them the ability to count - but not in all contexts for sure.

So yes, my point is that both have an optimisation process that through time lend them those emerging capabilities.

No, the point was that LLMs are not good at problem solving because they are not good at problem solving, not because they were not evolved. We don't understand how we or animals solve problems, which is why we haven't yet succeeded in replicating that in AI. You're the one bringing in these incredibly strained analogies, because you want GPT 4 to be more than it is or something, I'm not sure.
The bigest difference is that training does not change the size or architecture of an artificial neural network, but biological evolution dramatically changes the size and architecture of animals' brains.

Your comparison is sincerely vacuous. It vaguely makes sense if you're talking about GPT-3 to GPT-4 (though I don't think it's helpful). It makes no sense if you're talking about training a single neural network.

I mean that exposure to a lot of training material yielded in the final set of capabilities. Pigeons and their ancestors were exposed to certain situations throughout evolution that yielded in the formation of neural network and its ability to "count". Which I believe is not actual "one, two, three", but just the amount of signals being activated resulting in a certain output from pigeon. There's a difference in how a human counts, except for small numbers which you can intuitively immediately come up with a number.

There was training material which were situations to which organisms had to produce output for and if the output was good their genetics survived, eventually forming the neural network that was able to handle this training material well, but similarly producing emergent behaviour like being able to "count".

But GPT-Vision can easily do as well what a Pigeon can. What's the exact thing that implies Pigeon is doing it somehow more intelligently?

If you ask them on a picture the quantity of something, I'm pretty sure both respond to the amount of this type of signal received either though light waves or pixels encoded for GPT.

> Aren't animals trained to do all of those things through evolution? Similarly how GPT is trained.

If you interpret this question at the most abstract level of "aren't both solutions arrived at through training/trial+error method?" - then the answer is probably yes, they are both arrived at in some conceptually similar manner.

But they are two very different underlying systems and we don't really understand the biological systems well enough to even be able to truly compare.

Beyond that, it seems that humans (switching to humans from pigeons) have some sort of representation/understanding of the world around us such that even if we produce the same result as ChatGPT to a counting question, the information stored within our systems is not equivalent.

Evidently pigeons can count to 9. When I asked ChatGPT-4 to identify the irrational statements in this comment, it said there were 9, and I'm pretty sure pigeons can't tell if something is irrational or not.