Hacker News new | ask | show | jobs
by Ukv 554 days ago
> I've been annoyed by the redefinition of artificial intelligence since the LLM boom started

If there's any redefinition, it's being pushed further out. AI was previously used to describe far simpler systems, like expert systems and Deep Blue's alpha–beta search.

> Predicting the next token based on a compressed dataset of human generated content isn't intelligence in any meaningful definition of the word

I'd claim generating the next token is a sufficiently general task such that success can depend on essentially arbitrary intellectual capabilities. For instance, reliably completing unseen equations like `2335 + 4612 = ` requires ability to perform basic arithmetic.

> using a different definition of "reasoning" than most people would. The latest release of GPT is attempting to mimic reasoning

I think most people initially have some relatively solid definitions of "learning", "reasoning", "language use", etc. similar to how it's being used there - just that when non-humans meet those definitions there's an inclination to create some distinction between "learning" and an elusive "actual learning".

For instance, if something changes to refine its future behavior in response to its experiences (touch hot stove, get hurt, avoid in future) beyond the immediate/direct effect (withdrawing hand) then it can "learn". I think even small microorganisms can learn, with the main requirement being that it has some mutable state (can't learn if you can't change). Yet, others will object that "machine learning" is a misnomer because it's "not actual learning" and instead "just mimicking/simulating".

1 comments

For to define "reasoning", you have to deal with (at least) the following sub-questions:

1. What is knowledge?

2. How can knowledge be encoded in a machine?

LLMs say that knowledge is encoded in the relationships between words (and, in fact, has been by the corpus of human writing), and that's enough. Expert systems said that knowledge could be encoded in carefully-written rules, and that's enough.

I'm pretty sure that any actually intelligent[1] computer is going to have to have more than one flavor of knowledge representation, and be able to shift between them as the situation warrants.

[1] Whatever "actually intelligent" may mean. I don't have to know what it is, though, to recognize that what we have so far is inadequate.

> For to define "reasoning"

I'd say reasoning is the process of applying logic to draw inferences from some information/axioms/assumptions. For instance if you're asked "can a fridge fit in a bread-box?" and (implicitly or explicitly) go through:

1. A fridge is much larger than a bread-box

2. Larger objects cannot fit inside smaller objects without flexibility

3. Neither objects are sufficiently flexible

4. Therefore, a fridge cannot fit in a bread-box

Then I'd be happy saying you have used reasoning to reach your answer.

> How can knowledge be encoded in a machine? [...] LLMs say that knowledge is encoded in the relationships between words [...]

I don't think it'd be fully correct to say that knowledge is only encoded by relations between words. The input/output of the model is tokens of text, but internally it'll be converted into high-dimensional semantic vector spaces of concepts.

Different words describing the same concept ("Bread-Box", "breadbin", ...), or even images in the case of multi-modal models, can be associated with the internal representation of a bread-box, from which useful semantic manipulations/inferences can be made about the concept and not just the word used to reference it (like approximating the bread-box's size, a factor potentially learned from images but applied to answer a textual question).

> I don't think it'd be fully correct to say that knowledge is only encoded by relations between words. The input/output of the model is tokens of text, but internally it'll be converted into high-dimensional semantic vector spaces of concepts.

All right, how about this: LLMs do have actual knowledge - the knowledge that was encoded in the words in the training data. That's not how they store the data internally, but the actual knowledge comes from there.

And I wasn't saying that that's enough. I was saying that the LLM advocates think, or at least claim, that it's enough.

> LLMs do have actual knowledge - the knowledge that was encoded in the words in the training data. That's not how they store the data internally, but the actual knowledge comes from there.

For non-multimodal models, and minus ephemeral context and what's encoded by the architecture (like the translational invariance of CNNs), I'd agree to that.

> And I wasn't saying that that's enough. I was saying that the LLM advocates think, or at least claim, that it's enough.

Most modern LLMs like GPT-4, LLaMA-3.2, Gemini, or Claude 3.5 are already multimodal (text, images, sometimes video, sometimes audio). If you primarily just meant that's a good pathway to building richer internal world representations (and thus better at answering questions involving 3D geometry, for instance) then I'd also agree there, though I don't see why it'd be a requirement for reasoning/etc. (opposed to just beneficial).

No, I would put text, images, video, and audio as one kind of "stuff" - NN training stuff. I would put knowledge graphs and rules for reasoning engines as another kind of stuff. If you use "modes" for text and images and so on, then I want something different from just "multimodal". I want left-brain vs right-brain, or slow vs fast, or something on that order. I want a different kind - not just fancier and larger LLMs. I want an LLM coupled to an inference engine with the Cyc encyclopedia available to it... or something in that direction. Maybe further than that.

Just LLMs aren't enough, and they aren't going to be enough.

You use words like "reasoning", but LLMs do not reason in the same way that an inference engine does. They can, at best, simulate it badly. I think we need more - not more of what we've got, but more of a different kind.

> I want something different from just "multimodal". I want left-brain vs right-brain, or slow vs fast, or something on that order. I want a different kind - not just fancier and larger LLMs. I want an LLM coupled to an inference engine with the Cyc encyclopedia available to it...

So if I'm understanding, your objection isn't about the modalities that the model can work with (text, video, diagrams, ...), but about the kinds of processing it can do?

Many modern LLMs support tool calling (e.g: to look up entities in Google's knowledge graph, or evaluate code), mixture-of-experts architecture (specialized subnetworks that are enabled/disabled as needed per-query), and chain-of-thought inference (for questions requiring more complex reasoning). Would you consider those to be steps in the right direction?

> You use words like "reasoning", but LLMs do not reason in the same way that an inference engine does

If you view reasoning as something inference engines can do, then I don't think we disagree too much. Remaining difference may just be about error rate - I'm personally fine saying something can reason (at least "to some extent") even if it's a little fuzzy and not 100.0% accurate formal logic (else animals would also be excluded).