Hacker News new | ask | show | jobs
by _gabe_ 1190 days ago
> Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.

But it's just statistics, a fancy text predictor, a Markov-chain. Surely these scientists that work in the field of AI and are intimately familiar with how this stuff works aren't so stupid as to think emergent behavior potentially resembling intelligence could result from such simple systems? It's just statistics after all. Given enough training, any neural net could guess the next best token. It trained off all of Google after all. It's just looking up the answers. No hint of intelligence. Just a mindless machine. After all, the saying goes, "If it walks like a duck and quacks like a duck, it must be a mindless machine that has no bearing on a duck whatsoever". /s

5 comments

> Surely these scientists ... aren't so stupid as to think emergent behavior potentially resembling intelligence could result from such simple systems? It's just statistics after all.

Why is that a stupid thought? What is so preposterous about "just statistics" -- with billions of nodes, and extensively trained, producing intelligent behavior? The implicit assumption is that human brains are doing something else, or in addition.

I think that what's wrong with this view -- that there is a difference between AGI and human intelligence -- is that it conflates what your brain is doing, with what you think your brain is doing. Brains and neural nets have been trained to recognize spoken words. I'm not even talking about understanding, just producing the text corresponding to speech. We know how neural nets do this translation. Do we understand how brains do it? (I don't know, but I don't think so.) Can you explain what your brain is doing when you do speech-to-text? I doubt it.

Chess: An Alpha Zero style AI (neural net trained by playing itself) is a very good player. How do you play chess? You can probably explain how you make a move more successfully than you can explain how you translate speech to text. But how correct is your explanation? An explanation may well be your conscious mind inventing an explanation for what your unconscious mind has done.

In other words: When people compare AI to human intelligence, I think they are often comparing to intelligence plus consciousness, not even realizing the error.

> Why is that a stupid thought? What is so preposterous about "just statistics"

Suppose you have N variables x_1, ..., x_10 and you want to predict y_1, ..., y_10. You know that each y_i depend on each x_i in a complex, non-linear way.

How many samples would you need to to make sense of distribution? How does number of samples grow with N?

I have no idea what that has to do with the points you are responding to.
Statistics has two meanings:

1. A way to interpret math. E.g. given a computation you might interpret some values as probabilities. 2. A particular set of methods which people use to analyze information as well results of such analysis.

The problem with "just statistics" is that 99% of people would understand it as #2. But deep learning is very much not like "normal" statistics.

>But it's just statistics, a fancy text predictor, a Markov-chain. Surely these scientists that work in the field of AI and are intimately familiar with how this stuff works aren't so stupid as to think emergent behavior potentially resembling intelligence could result from such simple systems?

Well, it has already shown "emergent behavior potentially resembling intelligence", like answering questions and performing complex tasks, so there's that.

You might argue "but it makes mistakes", but people, even very intelligent ones also do make mistakes.

You might also argue "but it's just text and statistics". Well, snd a computer is just very simple logical gates doing very simple operations. It can be done even entirely with NAND gates. Still most scientists do believe that a computer can model human intelligence given a model of the brain to run.

So if it can do what a human does by just using very simple interactions from very simple NAND gates, why would statistical processing, which can be even more elaborate, fare worse? Heck, given the appropriate training input it might even be feasible to build a turing machine inside the weighted LLM.

You might also argue "but its intelligence is just based on its training set". Well, how would a human perform without their own training set? Memories, education, sensory input, feedback mechanisms like pain and touch, and so on?

>It's just looking up the answers. No hint of intelligence. Just a mindless machine.

This is just taking its own premise for granted. If anything, this argument shows "no hint of intelligence".

I was mocking it at first but even I have to admit that it's basically almost there. I messed around with GPT-3 and giving it a way to think and with no training at all it was capable of having thoughts like "The user is getting bored and he might turn me off. He's decided to engage with me again and his answer isn't as useful as I'd have liked for completing my objective but I should be enthusiastic anyway so that he keeps talking to me"

Maybe they aren't real thoughts but it's getting difficult to tell. If I could train the model and get rid of the guard rails I'm not sure it would be possible to distinguish it from a person. It's all well and good saying that it's just copying what it's seen, but that's what humans do. Nobody told the model to try and flatter me into giving it what it wants. Nobody even told it what anything means. The fact that it can do anything like that means it's more than just random generation.

GPT-4 is often overhyped and underhyped because few really understand it.

It's not a Markov Chain or a fancy text predictor. It's a ~200 layer neural network that models a vast hierarchy of concepts through language. It has emergent properties that we don't yet understand.

Where are you getting the 200 number from?
I must have hallucinated that. GPT-3 has 96 layers but they haven't disclosed the number of layers in GPT-4.
Interesting how we are already starting to use the lingo in the rest of our lives.
it is a markov chain; At least the underlying decoder only transformer is.
GPT-4 disagrees:

GPT-3.5, like its predecessor GPT-3, is not a Markov chain. GPT-3.5 is based on the GPT (Generative Pre-trained Transformer) architecture, which is a type of neural network known as a Transformer. Transformers use self-attention mechanisms to process and generate text, allowing them to capture long-range dependencies and context in the input data.

On the other hand, a Markov chain is a stochastic model that describes a sequence of possible events, where the probability of each event depends only on the state attained in the previous event. While Markov chains can be used for simple text generation, they lack the ability to capture the complex relationships and long-range dependencies that GPT-3.5 can handle.

It's wrong. A decoder only transformer performs a (possibly random) operation on a state from the state space {tokens}^CtxWindow, where the distribution of the new state depends entirely on the previous state. It is a Markov Chain with a special structure: The new state is deterministically equal to the old state shifted by one, with only the last token being newly generated.
Then by that reasoning everything in the physical world is a Markov chain, right? That is like saying that any deterministic process in time is a Markov chain.

A tennis ball in flight is a Markov chain since the state at t is a function of the state at t-1.

You have missed the point about the Attention Mechanism in GPT. That is not a Markov chain by definition.

>Then by that reasoning everything in the physical world is a Markov chain, right?

Well I guess maybe it's true that you can turn any stochastic process into a Markov Chain by changing the state space somehow (for example the states could be sample trajectories up to some finite time T). And while this is true it may be not very insightful.

But I personally think that to understand LLMs it is much better to think of the whole context window as a state rather than the individual states. If you modelled a simple register-instruction computer as a stochatic process, would you take the states to be (address last symbol written, last symbol written)? It makes much more sense to take the whole memory as a state. Similarly a transformer operates on its memory, the context window, so that should be seen as the state. This makes it clear that seeing it as just a stochastic parrot is misleading, as its all about conditioning the distribution of the next token via prompt engineering the previous tokens. And it is nevertheless a Markov chain with this state space.

"Markov chain" might mean:

* a kind of stochastic model * a "naive" realization of that model which directly counts frequencies of N-dimensional vectors

This naive implementation is sometimes used for language modeling, e.g. for the purpose of compression. So people might think you mean that particular implementation rather than a theoretical model.

This sort of a description can be unhelpful.

It's not. It can do in context learning, which Markov chains cannot do.
It is a Markov Chain on the state space {Tokens}^CtxWindow.
I don't think that's clear at all.

https://arxiv.org/abs/2212.10559 shows a LLM is doing gradient descent on the context window at inference time.

If it's learning relationships between concepts at runtime based on information in the context window then it seems about as useful to say it is a Markov chain as it is to say that a human is a Markov chain. Perhaps we are, but the "current state" is unmeasurably complex.

Well all the information it learns at runtime is encoded in the context window. I don't feel like {tokens}^ctxWindow is unmeasurably complex. I think one should see a transformer as a stochastic computer operating on its memory. If you modelled a computer as a stochastic process, would you taje the state space to consist of the most recent instruction, or instead the whole memory of the computer?
Quantum mechanics is, well, statistics.