Hacker News new | ask | show | jobs
Ask HN: What is the current state of "logical" AI?
32 points by mtlb 905 days ago
The kind of AI that gets the public attention right now lacks a quality that can be described as "formal correctness", "actual reasoning", "rigorous thinking", "mathematical ability", "logic", "explainability", etc.

This is the quality that should be studied and developed in symbolic AI approach. However, the actual symbolic AI work I know of seems to fall in one of the two buckets: 1. "Let's solve a mathematical problem (e.g. winning at chess) and say that the solution is AI" (because humans can play chess, and now computers can too!) 2. "Let's make something like Prolog but with different solver algorithm / knowledge representation". Products like Cyc and Wolfram seem to work essentially in this manner, although with lots of custom coding for specific cases to make them practical. There's lots of work on separate aspects of this as well, like temporal and other modal logics.

I see the first bucket as just applied maths, not really AI. The second bucket is actually aimed at general reasoning, but the approaches and achievements in it are somewhat uninspiring, maybe because I don't know many of them.

So my broad question is: what is happening in such "logical AI" research/development in general? Are there any buckets I missed in the description above, or maybe my description is wrong to begin with? Are there any approaches that seem promising, and if so, how and why?

I would be grateful for suggestions of the books/blogs/other resources on the topic as well.

13 comments

AI is not even close to having true logical reasoning, that's probably decades away. The issue is that cognitive scientists are clueless. Scientists have a good model for associative reasoning, which is the basis of modern neural networks, but we don't have a clue how abstract reasoning actually works. All birds and mammals have advanced abstract reasoning and are far more intelligent than GPT-4:

- birds and mammals are inherently able to count in almost any context because they understand what numbers actually mean; GPT-4 can only be trained to count in certain contexts. GPT-4 would be like a pigeon that could count apples, but not oranges, yet biological pigeons can count anything they can see, touch, or hear. There's a profound gap in true quantitative reasoning, even if GPT-4 can fake this reasoning on specific human math problems.

- Relatedly, birds and mammals are far faster at general pattern recognition than GPT-4, unless it has been trained to recognize that specific pattern.

- Birds and mammals can spontaneously form highly complex plans; GPT-4 struggles with even the simplest plans, unless it has been trained to execute that specific plan.

The "trained to do that specific thing" is what makes GPT-4 so much dumber than warm-blooded vertebrates. When we test the intelligence of an animal in a lab, we make sure to test them on a problem they've never seen before. If you test AI like you test an animal, AI looks incredibly stupid - because it is!

There was a devastating paper back in 2019[1] proving that Google's BERT model - which at the time was world-class at "logical reasoning" - was entirely cheating on its benchmarks. And another paper from this year[2] demonstrates that LLMs definitely don't have "emergent" abilities, AI researchers are just sloppy with stats. It is amazing how much bad science and wishful thinking has been accepted by the AI community.

[1] https://arxiv.org/abs/1907.07355

[2] https://arxiv.org/abs/2304.15004

As a current researcher in the field I am perpetually annoyed by the overeagerness of AI research to make fantastical claims. Reading and extracting information from papers is a minefield, and we've learned to always at least A/B test the conclusions of any technique that is supposedly proven to be useful. Even foundational papers about basic concepts in LLMs, for example, can sometimes boil down to "this worked well on our cherrypicked tests"
Yeah, the "logical reasoning" in LLMs is mostly a marketing device to get products sold and papers published. One could hope that starting with reasoning instead of trying to get it "emerge" would do a better job. But if we have little idea of how abstract thinking actually works, this is a problem. What do you think about current logic-based AI approaches? Do they try to replicate the best ideas we've got from congnitive sciences, or trying to do their job for them?
> One could hope that starting with reasoning instead of trying to get it "emerge" would do a better job.

We did AI starting with reasoning (directly implementing rules of propositional logic) first, it is called expert systems.

It works very well for some things, but after some efforts to expand it with things like fuzzy logic it became pretty much accepted that we'd reached its limit.

You could hope that it would work better, but...

There's a distinction between "constrain the AI's output with logical rules to make it more reliable" and "build logical reasoning into the AI." The current strategies are trying to do the first task and I bet it'll lead to all sorts of cool technology. I strongly doubt these techniques will extend to the actual logical reasoning. Intuitively, it feels like throwing a bunch of logical rules onto AI is begging the question - I doubt bird/mammal brains actually have these logical rules baked in, I am sure it's far more sophisticated.

A trivial theorem in logic gives an example of what I mean:

If A then B <=> If (not B) then (not A)

This is really not how humans think - I don't believe we have a "contrapositive calculator" in our brain that takes arbitrary situations in and computes a contrapositive. This contrapositive theorem is a fact of the world that humans used logical thinking to understand, and which can be applied to formal logical computations that human brains aren't necessarily good at.

Specifically, I don't think non-human animals have "logical" thinking at all, they have causal thinking, and human logic is a consequence of us having exceptionally good understanding of causality. Logic is itself a special case of causality, formalized in a "generic" fashion by human language and used as a tool to help us think through tricky cases.

The contrapositive theorem takes a bit of thought for me to unwind - "so if B is not true then of course A can't be true" - but the way contrapositives are reflected in the real world takes no thought whatsoever, even if the examples are more algebraically complicated than A->B <=> (~B)->(~A):

- if the door is working and I have a key that can unlock the door, then if I can't unlock the door either I don't have the key or the door is broken. (AvB)->C <=> (~C)->(~A ^ ~B)

- if having gas implies my car can drive, then if my car can't drive I don't have gas - or possibly I was incorrect and my car is broken. (A->B <=> (~B)->(~A)) V (~)(A->B)

These cases are obvious to us because the brain has access to much fancier causal reasoning than what we can currently express in human language. For now, human language is stuck with "If a then not b" stuff. I don't think feeding this limited human language into a computer is going to burst past these limits. We need to figure out how bird/mammal brains actually model things causally.

> These cases are obvious to us because the brain has access to much fancier causal reasoning than what we can currently express in human language. For now, human language is stuck with "If a then not b" stuff.

I don't follow this. Didn't you just express these cases in human language? I understand that in reality we can "grasp" the meaning of a problem of not being able to open the door without expressing or thinking about it verbally, which would be redundant as there would be a lot to say (the key may be broken, the door may be held by someone on the other side, even if the key works we might be trying to push instead of pull, etc, etc.) and any person who has opened doors with keys would likely understand all of this. The problem is not that those things can't be expressed in human language, but the lack of ability to build good conceptual models of the world that encompasses all such knowledge and allows reasoning on it quickly.

I didn't mean the specific cases, I meant the underlying mechanism that our brain uses to reason about these cases. There is something deeper going on that allows us to build rigorous world models from very thin abstractions, which can be applied to a seemingly arbitrary range of problems. It's this rigorous world model which is absent in AI and not currently explained by cognitive science.

In this example, the overall world model is able to easily accommodate "broken door" "functioning door" "key" etc. and come to a specific conclusion about this problem. The specific conclusion can be easily expressed in human language. The world model itself can't.

Aren't animals trained to do all of those things through evolution? Similarly how GPT is trained.

Also how do you prove that GPT is worse at counting?

Because GPT can currently count both apples and oranges.

> Also how do you prove that GPT is worse at counting?

Back in June 2023 GPT-4 was dramatically worse at counting than a pigeon in the sense that it couldn't accurately tell the difference between sentences with 3 words and sentences with 5 words, whereas pigeons can count almost anything up to about 10. It also routinely failed "pick the shorter sentence" tests which I literally took from a test administered to mice. GPT simply doesn't understand what numbers are, whereas pigeons and mice have an intuitive understanding similar to toddlers. You don't need to teach kids what 3 means, you just need to teach them the human symbol for the concept of 3. GPT only has the human symbol and does not seem capable of understanding the concept.

In my testing GPT-4 consistently failed counting / pattern-recognition tests even if you used "chain-of-thought" prompting. As far as I could tell its only true understanding of numbers was "one, two, many." This seems reflected in real use cases, where GPT routinely (and hilariously) ignores commands to return 50 words/etc of output. GPT doesn't know what fifty means, it just knows what various documents that say "word count: 50" look like, and tries to imitate the tone.

Since transformer neural networks lack recursion I conjecture that GPT will never be able to understand a number larger than 2, even if in specific cases it can solve counting problems up to eleventy billion. This is what I mean by "counting apples, not oranges," its sense of counting is paper-thin and easily fooled by adversarial prompts. It is much harder to fool a mouse or a pigeon.

Many of the tests I ran back in April 2023 no longer work. I strongly suspect this is because OpenAI trained GPT to many of the tests that people were throwing at it, and not because GPT actually became "smarter." I stopped messing around with GPT specifically because OpenAI doesn't issue any release notes, making replicability impossible. Mistrial's 77B model was dramatically worse than even GPT-3 at counting, but I doubt they trained it to count. Not sure about LLaMa/etc.

When you are talking about "counting", do you mean the logical process of going "one", "two", "three"... or do you mean the ability to statistically estimate the amount of quantity by the amount of signal you are processing?

E.g. are pigeons actually "counting" as in the process how humans calculate to be accurate? Or are they just responding to the signal? Like similar to how a person could tell whether some sound is higher or lower pitch, but they wouldn't be able to actually numerically say the actual exact frequency.

Because to me pigeons are just similarly responding to the amount of "signal" they are receiving, not actually doing abstract reasoning.

And looking at the science studies, it also seems that they had to train pigeons to be able to count, they weren't able to do it out of the box.

But by the way, when you are criticising GPT's ability to count words in the sentences you are saying, that is quite odd to me. Because the input that GPT receives is actually tokens, not the words you give it.

So then imagine if someone asked you a question in English, and then translated it to hieroglyphs, and you didn't know English. Would you be able to count how many words were there in the original English?

So it seems weird to expect that GPT would be able to count in the first place.

But however if it later was taught how many words the combination of different tokens yielded to, it would be able to do that. So perhaps this is what was taught to it meanwhile yielding in that better ability to count words?

Thirdly GPT with Vision can count objects on an image very well, doesn't matter what the objects specifically are. Does it make mistakes? Sometimes, when objects are not clearly visible, but so would humans and pigeons.
The biological neural structures that encode behavior are “trained” through evolution, but even the most advanced animals rely mostly on conditioned (= learned during lifetime) reflexes, and not on the ones “hardcoded” evolutionary.

Certainly not much evolutionary “training” in the human brain has happened in the last 3000 years, yet advancement in our understanding of the world has been plentiful. But human thinking (including rationality, mathematics, etc.) is on a different level to even learned animalistic behavior. Some great apes were taught language and even showed basic abstract conceptual thinking, but were never able to reach the level of 3-5 years old human kids.

The problem with GPTs and other statistical models is that they can learn incredibly complex patterns in anything we can express as bytes, but not learn the simplest concepts of maths despite being trained on the whole corpus of mathematical texts available on the internet, while kids need classes that can be covered in a single textbook to understand them, and adults may need just a textbook for this.

The claim about those models being "statistical". Why wouldn't you consider human brain to be statistical or animal's brains as "statistical"?

Because in the end human brains as well as any brains it seems they could be thought of statistical results from long periods of training and producing output from input. Where am I wrong?

I assume by statistical you mean that the ending result of state of neurons can be represented as numbers and pathways leading through these as probabilities of going through a certain pathway - but it occurs to me that same is with human brain, no?

You are not wrong, but too abstract. Saying that brain is statistical and therefore can be represented with statistical model such as artificial neural net is like saying that brain is computational and therefore can be represented with a Turing-complete system, or that brain is physical and can be represented via physical simulation, etc.
> Aren't animals trained to do all of those things through evolution? Similarly how GPT is trained.

No. Animals evolved and are able to do those things. Evolution is not training, and evolution has approximately zero to do with how transformers work.

What's the difference here?

Both seem to have adaptive neural networks where those networks change as time goes on due to a reward - for animals, mutated genes being more likely to be given forward if the change was good. Over millions of generations it's statistically likely that more good genes that caused the neural networks to be in a state that is better able to solve problems within the environment get passed on, eventually resulting in an emerging intelligence. For training you similarly change the state of the neural network depending on whether the answer is good or bad.

Evolution operates on genes, which do not encode synaptic connections for one thing. The analogy you're making here is so stretched it's hard to begin to say what's wrong with it. Backpropagation and natural selection are about as different as two things can be. About the only thing you can say they have in common is that both can be modeled as optimization processes.

What's the difference between a star and a bonfire? Both use fuel and produce heat and light.

I mean the point was about LLMs not truly being problem solvers because they were trained to do so as opposed to having been evolved through evolution. I'm looking for what the difference is specifically within that dimension. Biological pigeons had their own process of evolution how they reached to have the type of neural networks and systems in themselves that gave them the ability to count - but not in all contexts for sure.

So yes, my point is that both have an optimisation process that through time lend them those emerging capabilities.

The bigest difference is that training does not change the size or architecture of an artificial neural network, but biological evolution dramatically changes the size and architecture of animals' brains.

Your comparison is sincerely vacuous. It vaguely makes sense if you're talking about GPT-3 to GPT-4 (though I don't think it's helpful). It makes no sense if you're talking about training a single neural network.

I mean that exposure to a lot of training material yielded in the final set of capabilities. Pigeons and their ancestors were exposed to certain situations throughout evolution that yielded in the formation of neural network and its ability to "count". Which I believe is not actual "one, two, three", but just the amount of signals being activated resulting in a certain output from pigeon. There's a difference in how a human counts, except for small numbers which you can intuitively immediately come up with a number.

There was training material which were situations to which organisms had to produce output for and if the output was good their genetics survived, eventually forming the neural network that was able to handle this training material well, but similarly producing emergent behaviour like being able to "count".

But GPT-Vision can easily do as well what a Pigeon can. What's the exact thing that implies Pigeon is doing it somehow more intelligently?

If you ask them on a picture the quantity of something, I'm pretty sure both respond to the amount of this type of signal received either though light waves or pixels encoded for GPT.

Evidently pigeons can count to 9. When I asked ChatGPT-4 to identify the irrational statements in this comment, it said there were 9, and I'm pretty sure pigeons can't tell if something is irrational or not.
This may only be tangentially related, but you might be interested in the recent research on Qualitative Constraint Satisfaction Problems - a good introduction to the topic is Manuel Bodirsky's habilitation thesis [1].

The purpose of the subject is, roughly speaking, to exhaustively characterize all types of reliable reasoning which can be carried out efficiently - some people say they are searching for "a logic for P". The techniques used are a mix of ideas from model theory, universal algebra, Ramsey theory, and computer science. Given the ridiculously ambitious scope of the project, I think the rate of progress (especially in the past few years) is astounding.

[1] https://arxiv.org/pdf/1201.0856.pdf

Statistical models based on gigantic text databases doe not make logical reasoning closer . Even if called AI.
Something that would massively improve language models ability to reason is whiteboarding. Being trained to make, review, improve, and add to notes. While maintaining a consistent goal.

I am unaware of anyone who can reason to any serious depth without a paper, computational, or actual version of a whiteboard.

This doesn’t seem like a particularly challenging thing to add to current shallow (but now quite wide) reasoning models.

Imagine how fast you could think if you had a mentally stable whiteboard that you could perceive as clearly as you can see, and update as fast as you can think the changes.

Our brains have probably been tragically speed limited by our slow vocal & finger speeds for some time.

That will take AI’s to a wide AND deep reasoning level far beyond us very quickly.

Now add mental file cabinets and an AI could trivially keep track of many goals and it’s progress on them. Again, not likely to be a huge challenge to add.

Now, given all that long term reasoning ability, let the AI manage instances of itself working across all the problems with speed adjusted for priority & opportunity.

Finally, have the model record every difficult problem it solved, so it’s fast wide (non-whiteboard) abilities can be tuned, moving up level after level. Occasionally do a complete retraining on all data and problem-solution pairs. Again, straightforward scaling.

Every new dimension they scale quickly surpasses us & keeps improving.

At this point, IMHO, anyone pessimistic about AI has expectations far behind the exponential curve we are in. Our minds constantly try to linearize our experiences. This is the worst time in history to be doing that.

>Imagine how fast you could think if you had a mentally stable whiteboard that you could perceive as clearly as you can see, and update as fast as you can think the changes.

Thinking about what I am going to draw or write on the whiteboard takes the bulk of time, not the act of drawing or writing. The "update as fast as you can think" part will likely be achieved soon with neural interface, yet it's hard to imagine that this will lead to "superintelligence" of some sort. Same for "mental file cabinets": real or digital files allow to trivially store information, and search systems allow to retrieve it pretty quickly, yet somehow Google didn't make everyone who can use it super smart.

Same goes for vocal speed: coming up with the words to describe the idea and coming up with the idea itself are different things, second being much more hard.

> At this point, IMHO, anyone pessimistic about AI has expectations far behind the exponential curve we are in.

The problem is that the crucial aspect of reasoning is missing in the state of the art models right now. We can make LLMs write to and read from files, but as long as there is a chance that any of its output will be incoherent (and there's a good of this chance now) and there is no mechanism to actually check for errors logically, the whole whiteboard architecture will be a huge demonstration of "garbage in, garbage out".

Our minds do operates internally much faster than our mouths or fingers.

The speed we go from thought to thought internally is lighting, compared to how fast we operate when we have to update the subjects of these thoughts on pen and paper. Or explain every step of our thinking, as we make it, to someone else verbally.

Our brain is far more densly connected and faster operating than brain signals sent to direct a physical arm and hand, pen, paper, back through the visual system.

Being able to adjust any stable visualization in the mind by just visualizing the change to instant effect, removes mental friction and increases internal bandwidth.

Any removal of friction or increased bandwidth to thinking is profound.

Slowed more careful thinking, and slower collaborative thinking, are often helpful. But being slowed down by limitations is never a help.

An interesting approach I came across at NeurIPS a few weeks ago is called "ML with Requirements"[1]: https://arxiv.org/abs/2304.03674

My basic understanding is that it combines "standard" supervised learning techniques (neural nets + SGD) with a set of logical requirements (e.g. in the case of annotating autonomous driving data, things like "a traffic light cannot be red and green at the same time"). The logical requirements not only make the solution more practically useful, but can also help it learn the "right" solution with less labelled data.

[1] I don't know if they had a NeurIPS paper about this; I was talking to the authors about the NeurIPS competition they were running related to this approach: https://sites.google.com/view/road-r/home

See https://cacm.acm.org/magazines/2023/6/273222-the-silent-revo... and also modern production rules engines like https://drools.org/

Oddly, back when “expert system shells” were cool people thought 10,000 rules were difficult to handle, now 1,000,000 might not be a problem at all. Back then the RETE algorithm was still under development and people were using linear search and not hash tables to do their lookups.

Also https://github.com/Z3Prover/z3

Note “the semantic web” is both an advance and a retreat in that OWL is a subset of first order logic which is really decidable and sorta kinda fast. It can do a lot but people aren’t really happy with what it can do.

Thanks, modern rule engines and description logic formalisations are something for me to explore! Are there any other practical applications of such advanced SAT solvers?
At https://www.categoricaldata.net we claim that symbolic AI is also generative, when eg used in data warehousing. Instead of eg new images, the generatively gives you new primary and foreign keys, new ontologies, contradiction detection, etc.
Is there a conceptual difference between the categorical database and hypergraph database?
Yes; categories extend traditional graphs with systems of equations. Hypergraphs extend traditional graphs by allowing edges to be between multiple nodes. Most operations on categories are formally undecidable because of the systems of equations; most operations on graphs/hypergraphs are decidable. This makes working with categorical databases a lot like doing computer algebra in e.g. Mathematica and provides a huge increase in expressive power (you can e.g. encode Turing machines with equations.)
One of approaches is currently on the front page https://news.ycombinator.com/item?id=38767815
"Formal reasoning" or "logic" as you suggest is a model for finding "truth" from static inputs and simple operations. However, if the inputs are random variables (they have an associated distribution) then so (likely) are the outputs, and "truth" is still a random variable. The world we live in is better modeled by the latter than the former, and as such the "decision tree" approach of AI seems like a more reasonable approach and model to finding "truth" than a strictly mathematical approach.
You can still have stochastic model that works on and/or produces ie. coq formalism.
Chess is best solved by fuzzy fake logic or whatever you want to call it.

Formal correctness is drastically different from “actual reasoning”.

Is there “I” part in logic at all? We ourselves aren’t logical. We happened to invent/discover logic as a way to interact closer with the world and learned to basically simulate a weak, leaky logic machine runtime in our minds. Later someone smart offloaded it to electronics (made with that exact principle, btw, which is one of these “hidden right before your eyes” type of nuances). Custom coding is probably the correct answer.
Zephyr is pretty good. Real pragmatist that one.
https://ollama.ai/library/zephyr

I like to point out higher quantization is certainly better (q8 minimally for consistently better results.)

Gemini Ultra should show good progress according to Google - it's supposed to perform better than 85% of computer science competitors, which requires a lot of logical reasoning. Lets see it once it goes live, but sounds promising.
Their previous model was better than 46% of such competitors (according to them), so 85% seems achievable by throwing more compute resources at typical ML training. After all, training on millions of examples of logical reasoning will undoubtedly store logical rules in the model in some shape or form (it does so even in ChatGPT), yet the results are still more "convincing" rather than "correct", or "probably correct" at best, usually achieved with lots of postprocessing on top. GPT-4 is better than 90% of lawers at the bar exam, yet still manages to fail at reasoning on much simpler domains.
Imaginary models hyped/faked/lipsticked by PR deparment, in future tense, in the field that advances on daily basis is quite weak in most discussions.