Hacker News new | ask | show | jobs
by HarHarVeryFunny 823 days ago
> it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

I wouldn't say LLMs aren't intelligent (at all) since they are based on prediction which I believe is the ability that we recognize as intelligence. Prediction is what our cortex has evolved to do.

Still, intelligence isn't an all or nothing ability - it exists on a spectrum (and not just an IQ score spectrum). My definition of intelligence is "degree of ability to correctly predict future outcomes based on past experience", so it depends on the mechanisms the system (biological or artificial) has available to recognize and predict patterns.

Intelligence also depends on experience, minimally to the extent that you can't recognize (and hence predict) what you don't have experience with, although our vocabulary for talking about this might be better if we distinguished predictive ability from experience rather than bundling them together as "intelligence".

If we compare the predictive machinery of LLMs vs our brain, there is obviously quite a lot missing. Certainly "thinking before speaking" (vs LLM fixed # steps) is part of that, and this Q* approach and tree-of-thoughts will help towards that. Maybe some other missing pieces such as thalamo-cortical loop (iteration) can be retrofitted to LLM/transformer approach too, but I think the critical piece missing for human-level capability is online learning - the ability to act then see the results of your action and learn from that.

We can build a "book smart" AGI (you can't learn what you haven't been exposed to, so maybe unfair to withhold the label "AGI" just because of that) based on current approach, but the only way to learn a skill is by practicing it and experimenting. You can't learn to be a developer, or anything else, just by reading a book or analyzing what other people have produced - you need to understand the real world results of your own predictions/actions, and learn from that.

4 comments

Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel), also quite simple organisms make predictions (e.g., a predator jumping at prey makes a prediction about positions).
>Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel)

Would it?

Why would "creating a novel" by a human not itself be text generation based on prediction on what are the next good choices (of themes, words, etc) based on a training data set of lived experience stream and reading other literature?

What is the human predicting there? Why would it need to be a prediction task at all? How about a dada-ist poem? Made-up words and syntax? If it is prediction but the criterion for "what is a good next choice" can totally be made up on the fly - what does the word "prediction" even mean?
>What is the human predicting there?

Their next action - word put on page, and so on.

>Why would it need to be a prediction task at all?

What else would it be?

Note that prediction in LLM terminology doesn't mean "what is going to happen in the future" like Nostradamus. It means "what is a good next word given the input I was given and the words I've answered so far".

>How about a dada-ist poem? Made-up words and syntax?

How about it? People have their training (sensory input, stuff they're read, school, discussions) and sit to predict (come up with, based on what they know) a made-up word and then another.

That is a meaningless definition of prediction if "what is a good next word" has an ever changing definition in humans (as everything would fulfill that definition).
That's the very definition of production in an LLM.

What does "has an ever changing definition" mean?

And why "everything would fulfill that definition"?

At any time whats the "good next word" is based on the state created by our inputs thus far (including chemical/physiological state, like decaying memories, and so on). And not only not "everything fullfil it", but it can be only a single specific word.

(Same as if we include the random seed among an LLM output: we get the same results given the same training and same prompt).

> Why would "creating a novel" by a human not itself be text generation based on prediction on what are the next good choices (of themes, words, etc) based on a training data set of lived experience stream and reading other literature?

Unless you're Stephen King on a cocaine bender, you don't typically write a novel in a single pass from start to finish. Most authors plan things out, at least to some degree, and go back to edit and rewrite parts of their work before calling it finished.

That can be expressed as text prediction. You output version 1 then output editing instructions or rewritten versions until you're done.

The real issue is running out of the input window.

> The real issue is running out of the input window.

isn't this what abstractions are for? you summarise the key concepts into a new input window?

Sure, but if we're talking about editing an entire book eventually the fine details do matter. That, and presumably human authors' abstraction/memories of their books are stored in some more compact form than language tokens. Though we can't be sure about that.
Maybe a better way to say it rather than "intelligence is prediction" is that prediction is what supports the behaviors we see as intelligent. For example, prediction is the basis of what-if planning (multi-step prediction), prediction (as LLMs have proved) is the basis of leaning and using language, prediction is the basis of modelling other people and their actions, etc. So, ultimately the ability to write a novel, is a result of prediction.

Yes, an insect (a praying mantis, perhaps) catching another is exhibiting some degree of prediction, and per my definition I'd say is exhibiting some (smallish) degree of intelligence in doing so, regardless of this presumably being a hard-coded behavior. Prediction becomes more and more useful the better you are at it, from avoiding predators, to predicting where the food is, etc, so this would appear to be the selection pressure that has evolved our cortex to be a very powerful prediction machine.

I think you're confusing prediction with ratiocination.

I'm sure you've deducted hypothesis' based solely on the assertion that "contradiction and being are incompatible". Note, there wasn't prediction involved on that process.

I consider prediction as a subset of reason, but not the contrary. Therefore, I beg to differ on the whole assumption that "intelligence is prediction". It's more than that, prediction is but a subset of that.

This is perhaps the biggest reason for the high computational costs of LLM's, because they aren't taking the shortcuts necessary to achieve true intelligence, whatever that is.

> I think you're confusing prediction with ratiocination.

No, exactly not! Prediction is probabalistic and liable to be wrong, with those probabilities needing updating/refining.

Note that I'm primarily talking about prediction as the brain does it - not about LLMs, although LLMs have proved the power of prediction as a (the?) learning mechanism for language. Note though that the words predicted by LLMs are also just probabilities. These probabilities are sampled from (per a selected sampling "temperature" - degree of randomness) to pick which word to actually output.

The way the brain learns, from a starting point of knowing nothing, is to observe and predict that the same will happen next time, which it often will, once you've learnt what observations are appropriate to include or exclude from that prediction. This is all highly probabalistic, which is appropriate given that the thing being predicted (what'll happen if I throw a rock at that tiger?) is often semi-random in nature.

We can better rephrase "intelligence is ability to predict well", as "intelligence derives from ability to predict well". It does of course also depend on experience.

One reason why LLMs are so expensive to train is because they learn in an extremely brute force fashion from the highly redundant and repetitive output of others. Humans don't do that - if we're trying to learn something, or curious about it, we'll do focused experiments such as "Let's see what happens if I do this, since I don't already know", or "If I'm understanding this right, then if I do X then Y should happen".

The ability to write a novel is different from actually writing a novel. If prediction forms the basis of (at least some forms of) intelligence, intelligence itself is more than prediction.
That's why I say our vocabulary for talking about these things leaves something to be desired - the way we use the word "intelligence" combines both raw/potential ability to do something (prediction), and the experience we have that allows that ability to be utilized. The only way you are going to learn to actually write a novel is by a lot of reading and writing and learning how to write something that provides the experience you hope it to have.
Kind of agree. I think, though, trying to shoe-horn intelligence into some evolutionary concepts is tricky because it is easy stack hypotheses there.
>The ability to write a novel is different from actually writing a novel

In what way, except as in begging the question?

Which LLM will on its own go and write a novel? Also, even for humans, just because you technically know how to write a novel, you might fail at it.
>Which LLM will on its own go and write a novel?

Which human will?

We get prompts all the time, it's called sensory input.

Instead of "write a noval" it's more like information about literature, life experience, that partner who broke our heart and triggered our writing this personal novel, and so on.

LLMs have shown that writing a novel can be accomplished as an application of prediction, at least to a certain level of quality.
I have yet to see an LLM write a novel on its volition.
> online learning - the ability to act then see the results of your action and learn from that.

I don't think that should be necessary, if you are talking about weight updates. Offline batch mode Q-learning achieves the same thing.

By online learning, did you mean working memory? I'd agree with that. Whether it's RAG, ultra-long-context, and LSTM-like approach, or something else, is TBD.

By online learning I mean incremental real-time learning (as opposed to pre-training), such that you can predict something (e.g. what some external entity is going to do next, or the results of some action you are about to take), then receive the sensory feedback of what actually happened, and use that feedback to improve your predictions for next time.

I don't think there is any substitute for a predict-act-learn loop here - you don't want to predict what someone else has done (which is essentially what LLMs learn from a training set), you want to learn how your OWN predictions are wrong, and how to update them.

> By online learning I mean incremental real-time learning, such that you can predict something (e.g. what some external entity is going to do next, or the results of some action you are about to take),

I used to believe this, but the recent era of LLMs has changed my mind. It's clear that the two things are not related: you don't need to update weights in real-time if you can hold context another way (attention) while predicting the next token.

The fact that we appear to remember things with one-shot, online training might be an illusion. It appears that we don't immediately update the weights (long term memory), but we store memories in short term memory first (e.g. https://www.scientificamerican.com/article/experts-short-ter...).

The fundamental difference is that humans do learn, permanently (eventually at least), from prediction feedback, however this works. I'm not convinced that STM is necessarily involved in this particular learning process (maybe just for episodic memories?), but it makes no difference - we do learn from the feedback.

An LLM can perform one-shot in-context learning, which in conversational mode will include (up to context limit) feedback from it's actions (output), but this is never learned permanently.

The problem with LLMs not permanently learning from the feedback to their own actions is that it means they will never learn new skills - they are doomed to only learn what they were pre-trained with, which isn't going to include the skills of any specific job unless that specific on-the-job experience of when to do something, or avoid doing it, were made a part of it. The training data for this does not exist - it's not the millions of lines of code on GitHub or the bug fixes/solutions suggested on Stack Overflow - what would be needed would be the inner thoughts (predictions) of developers as they tackled a variety of tasks and were presented with various outcomes (feedback) continuously throughout the software development cycle (or equivalent for any other job/skill one might want them to acquire).

It's hard to see how OpenAI or anyone else could provide this on-the-job training to an LLM even if they let it loose in a programming playground where it could generate the training dataset. How fast would the context fill with compiler/link errors, debugger output, program output etc ... once context was full you'd have to pre-train on that (very slow - months, expensive) before it could build on that experience. Days of human experience would take years to acquire. Maybe they could train it to write crud apps or some other low-hanging fruit, but it's hard to see this ever becoming the general purpose "AI programmer" some people think is around the corner. The programming challenges of any specialized domain or task would require training for that domain - it just doesn't scale. You really need each individual deployed instance of an LLM/AI to be able to learn itself - continuously and incrementally - to get the on-the-job training for any given use.

> but this is never learned permanently.

Are you sure? I think "Open"AI uses the chat transcripts to help the next training run?

> they are doomed to only learn what they were pre-trained with

Fine-tuning.

> The training data for this does not exist

What does "this" refer to? Have you read the Voyager paper? (https://arxiv.org/abs/2305.16291) Any lesson learnt in the library could be used for fine-tuning or the next training run for a base model.

> what would be needed would be the inner thoughts (predictions) of developers as they tackled a variety of tasks and were presented with various outcomes (feedback) continuously throughout the software development cycle

Co-pilot gets to watch people figure stuff out - there's no reason that couldn't be used for the next version. Not only does it not need to read minds, but people go out of their way to write comments or chat messages to tell it what they think is going on and how to improve its code.

> Days of human experience would take years to acquire

And once learnt, that skill will never age, never get bored, never take annual leave, never go to the kids' football games, never die. It can be replicated as many millions of time as necessary.

> they could train it to write crud apps

To be fair, a lot of computer code is crud apps. But instead of learning it in one language, now it can do it in every language that existed on stackoverflow the day before its training run.

> Are you sure? I think "Open"AI uses the chat transcripts to help the next training run?

> Fine-tuning.

The learning that occurs through SGD is proven to be less flexible and generalizing than what happens via context. This is due to the restricted way information flows through transformers and which is further worsened in autoregressive GPTs vs models with bidirectional encoders.

On top of that, SGD already requires a great many examples per concept and, the impact of any single example rapidly diminishes as learning rate tampers down as training ends. Finetuning a fully trained model is far less efficient, more crippled when compared to learning from context for introducing new knowledge. It's believed that instruction tuning helps reduce uncertainty in token selection more than it introduces new knowledge.

> Co-pilot gets to watch people figure stuff out

We don't actually know if that's true. It depends on how many intermediate steps Microsoft records as training data. If enough intermediate steps lead to bad results and needed backtracking, but that erasure is not captured, it will significantly harm model quality. It is not nearly as easy to do well as you make it seem.

All in all, getting online learning into models has proven very challenging. While some "infinite" context alternatives to self-attention are promising for LTM, it'd remain true that the majority of computational power and knowledge resides in the fixed FF weights. If context and weights conflict this can cause degradation during inference. You might have encountered this yourself with GPT4 worsening with search. Lots of research is required to match human learning flexibility and efficiency.

> Co-pilot gets to watch people figure stuff out

There's a reason most jobs require hands-on experience, and can't be learnt just by reading a book about how to do it, or watching someone else work, or looking at something that someone else created.

It's one thing to have a bag full of tools, but another to know how to skillfully apply them, and when to apply them, etc, etc.

You may read a book (or as an LLM ingest a ton of training data) and think you understand it, or the lessons it teaches, but it's not until the rubber hits the road and you try to do it yourself, and it doesn't go to plan, that you realize there are all sorts of missing detail and ambiguity, and all the fine advice in that programming book or stack overflow discussion doesn't quite apply to your situation, or maybe it appears to apply but for subtle reasons really doesn't.

Maybe if developers were forced to talk about every decision they were making all day every day throughout all sorts of diverse projects, from requirements gathering and design though coding and debugging, and an AI had access to transcriptions of these streams of thought, then this would be enough for them to generalize the thought processes enough to apply them to a novel situation, but even then, in this best case hypothetical scenario, I doubt it'd be enough. Certainly just watching a developer's interactions with an IDE isn't going to come remotely close to an LLM understanding of how to do the job of a developer, let alone to the level of detail that could hypothetically let it learn the job without ever having to try it itself.

I also think that many jobs, including developer and FSD, require AGI to backstop the job specific skills, else what do you do when you discover yourself in a situation that wasn't in the book you trained on? So, it's not just a matter of how do you acquire the skills to do a specific job (which I claim requires practice), but what will it take for AI architectures to progress beyond LLMs and achieve the AGI that is also necessary.

Id say intelligence is a measure of how well you can make use of what you have. An intelligent person can take some pretty basic principles a really long way, for example. Similarly, they can take a basic comprehension of a system and build on it rapidly to get predictions for that system that defy the level of experience they have. Anyone can gather experience, but not everyone can push that experience's capacity to predict beyond what it should enable.
To me, it is one of those things like defining what 'art' is, as in creating a model in our heads around a concept. We take our definitions and then use those to construct models like AI that simulate our model well enough.

In other words, I personally do not believe any system we develop will be truly 'intelligent', since intelligence is a concept we created to help explain ourselves. We can't even truly define it, but yet we try to test technologies we develop to see if they possess it. It is a bit non sensical to me.

Sure, we created the word intelligence to help describe ourselves, and our differing levels of ability, as well as applying it to animals such as apes or dogs that we see seem to possess some similar abilities.

However, if we want to understand where this rather nebulous ability/quality of "intelligence" comes from, the obvious place to look is our cortex, which it turns out actually has rather simple architecture! If uncrumpled our cortex would be a thin sheet about the size of a tea towel, and consists of six layers of neurons of different types, with a specific pattern of connectivity, and including massive amounts of feedback. We can understand this architecture to be a prediction machine, which makes sense from an evolutionary point of view. Prediction is what lets you act according to what will happen in the future as opposed to being stuck in the present reacting to what is happening right now.

Now, if we analyze what capabilities arise from an ability to predict, such as multi-step what-if planning (multi-step prediction), ability to learn and use language (as proven by LLMs - a predict-next-word architecture), etc, etc, it does appear (to me at least!) that this predictive function of the cortex is behind all the abilities that we consider as "intelligence".

For sure there is very little agreement on a definition of intelligence, but I have offered here a very concrete definition "degree of ability to predict future outcomes based on past experience" that I think gets to the core of it.

Part of the problem people have in agreeing on a definition of intelligence is that this word arose from self-observation as you suggest, and is more a matter of "i know it when i see it" rather than having any better defined meaning. For technical discussion of AI/AGI and brain architecture we really need a rigorously defined vocabulary, and might be better off avoiding such a poorly defined concept in the first place, but it seems we are stuck with it since the word is so entrenched and people increasingly want to compare machines to ourselves and judge whether they too have this quality.

Of course we can test for intelligence, in ourselves as well as machines, by using things like IQ tests to see the degree to which we/they can do the things we regard as intelligent (we'd really need a much deeper set of tests than a standard IQ test to do a good job of assessing this), but the utility of understanding what is actually behind intelligence (prediction!) is that this allows us to purposefully design machines that have this property, and to increasing degrees of capability (via more powerful predictive architectures).

I think that is my overall point though - we created a system (AI) based on how we see one aspect of a particular organ or system (brain, cortex, etc.), and, in this case, labeled intelligence as 'predictive behavior', and so develop systems after that model. But for starters, only mammals and a few other life branches have cortexes, and cortexes weren't always around.

Evolutionary theory isn't hinged on prediction in itself, it's just one possible aspect of it. But, organisms that rely on prediction or primarily see themselves as predictive machines will state the opposite, because we cannot do anything else but model off what we think we know.

It is also further diluted in the sense that we are always limited in what we can model because of the digital nature of our medium as it attempts to model analog systems. It is like saying that the words that I am typing right now are just like having a real human conversation. No, not really. It is a diluted form of conversation that focuses on a specific, bare part of the communicative process.

I don't think people are, yet, deliberately creating predictive machines because they see that as the path to intelligence. Things like ChatCPT are LLMs, born out of that (language model) line of research, where the goal has been to learn the rules of language. The fact that a language model, when made large enough, appears somewhat intelligent was an unexpected surprise.

Different species have evolved to have different capabilities. Humans have evolved to be generalists, able to survive in a huge variety of environments, which requires a high degree of adaptability. The key to adaptability is prediction - the ability to very rapidly (in space of minutes/hours/days - not evolutionary timescales) learn how things work in a new environment or in new conditions.

Not all animals need this degree of adaptability, since they have been able to survive and thrive in long-lasting stable environments. Examples might be crocodiles or sharks - very low intelligence, but great at what they do. Evolution is not generally about prediction or intelligence - it's about optimizing each species for their own environment(s).

We already know how to build machines that are more like crocodiles - great at doing one thing over and over, but now we have the capability and desire to also build machines that are generalists like ourselves, and that requires us to figure out a way how to implement intelligence. Given how hard a problem this has been (and continues to be) to solve, it makes sense to look at our brains for inspiration - where does our own intelligence come from, and it's highly notable that the part of our brain that most differentiates humans from other animals - our large neo-cortex - appears to be a prediction machine ... In studying humans no-one is saying that other animals are the same - it's just that humans are the animal who's capabilities we are trying to reproduce.

As I said, LLMs being intelligent was an accidental discovery - they were expected just to be language models, but it's certainly notable that the only thing they are trained to do is predict next word. They only do one thing, predict, and they exhibit unexpected intelligence, hmmm ...

At this point people are NOT yet all saying "prediction is the key to intelligence, so let's build predictive machines and assume they will be intelligent", but when you look at our cortex and look at LLMs, that does appear to be the obvious direction.

In this case I would say AI is the crocodile, the same as all life is. It's specializing (or becoming specialized) in something, which is prediction, in the same way a human (or any life that shows the same definitions of intelligence as us, like a crow solving a puzzle) can show success in a new or novel situation. But life does not need this definition of intelligence to survive, which leads to the basis of evolutionary theory. The trait of adaptability/prediction/intelligence is not always useful given a niche and can get weeded out, which is why most life does not need it, yet they are still around. In organisms that do possess it, it can be a detriment as well given specific situations (over analyzing, stuck in anxiety, excessive risks to adapt, etc.).

In other words, when we say an LLM is becoming intelligent, it's not that it is in the general sense. It's that we recognize the traits within it because the traits make sense to us and mimic what we define ourselves in terms of specializing, because quite obviously, we made it and provide its data input. But, the key difference is that AI has none of the original impetus or evolutionary pressures that led to our own ability to generalize/specialize. This is because its output is derived from human input, which is fed through it through digitized means, which means there is always some kind of 'loss' since it is a specialized aspect of us.

It is why I made the reference to typing. We are communicating right now, but at the same time, it is a specialized form of it. It is not the full original human experience of talking to one another, but does not have to be in this case, because it works well enough and has some advantages given the niche. If we were using Facetime, it would be much closer, but still not quite the same as being in the same room face-to-face.

In my opinion, we are not so much prediction machines, but rather mimickers who can also create mimics of themselves via what we can make. You do not need to be able to predict that well if you can just mindlessly copy something that succeeded somehow.