Whereas your post sounds like "Just give the approach more time, it shall continue to incrementally improve until it finally works someday, cuz reasons."
Early attempts at human flight approached it by strapping wings to people's arms and flapping: Do you think that would have eventually worked too, if only we had just given it a bit more time and faith?
> Early attempts at human flight approached it by strapping wings to people's arms and flapping: Do you think that would have eventually worked too, if only we had just given it a bit more time and faith?
Interestingly, we how have human powered aircraft... We have flown ~60km with human leg power alone. We've also got human powered ornithopters (flapping wing designs) which can fly but only for very short times before the pilot is exhausted.
I expect that another 100 years from now, both records will be exceeded, altough probably for scientific curiosity more than because human powered flight is actually useful.
> Just give the approach more time, it shall continue to incrementally improve until it finally works someday, cuz reasons
Yes. Because we haven't yet reached the limit of deep learning models. GPT-3.5 has 175 billion parameters. GPT-4 has an estimated 1.8 trillion parameters. That was nearly a year ago. Wait until you see what's next.
Why would adding more parameters suddenly make it better at this sort of reasoning? It feels a bit of a “god of the gaps” where it’ll just stop being a stochastic parrot in just a few more million parameters.
I don't think it's guaranteed, but I do think it's very plausible because we've seen these models gain emerging abilities at every iteration, just from sheer scaling. So extrapolation tells us that they may keep gaining more capabilities (we don't know how exactly it does it, though, so of course it's all speculation).
I don't think many people would describe GPT-4 as a stochastic parrot already... when the paper that coined (or at least popularized) the term came up in early 2021, the term made a lot of sense. In late 2023, with models that at the very least show clear signs of creativity (I'm sticking to that because "reasoning" or not is more controversial), it's relegated to reductionistic philosophical arguments, but not really a practical description anymore.
I don’t think we should throw out the stochastic parrot so easily. As you say there are “clear signs of creativity” but that could be it getting significantly better as a stochastic parrot. We have no real test to tell mimicry apart from reasoning and as you note we also can only speculate about how any of it works. I don’t think it’s reductionist in light of that, maybe cautious or pessimistic.
They can write original stories in a setting deliberately designed to not be found in the training set (https://arxiv.org/abs/2310.08433). To me that's rather strong evidence of being beyond stochastic parrots by now, although I must concede that we know so little about how everything works, that who knows.
Do you know how that “creativity” is achieved? It’s done with a random number generator. Instead of having the LLM pick the absolute most likely next token, they have it select from a set of most likely next tokens - size of the set depends on “temperature”.
Set temperature to 0, and the LLM will talk in circles and not really say anything interesting. Set it too high and it will output nonsense.
The whole design of LLMs don’t seem very well thought out. Things are done a certain way not because it makes sense but because it seems to produce “impressive” results.
I know that, but to me that statement isn't much more helpful than "modern AI is just matrix multiplication" or "human intelligence is just electric current through neurons".
Saying that it's done with a random number generator doesn't really explain the wonder of achieving meaningful creative output, as in being able to generate literature, for example.
> Set temperature to 0, and the LLM will talk in circles and not really say anything interesting. Set it too high and it will output nonsense.
Sounds like some people I know, at both extremes.
> The whole design of LLMs don’t seem very well thought out. Things are done a certain way not because it makes sense but because it seems to produce “impressive” results.
They have been designed and trained to solve natural language processing tasks, and are already outperforming humans on many of those tasks. The transformer architecture is extremely well thought out, based on extensive R&D. The attention mechanism is a brilliant design. Can you explain exactly which part of the transformer architecture is poorly designed?
People use the term "stochastic parrot" in different ways ... some just as a put-down ("it's just autocomplete"), but others like Geoff Hinton acknowledging that there is of course some truth to it (an LLM is, at the end of the day, a system who's (only) goal is to predict "what would a human say"), while pointing out the depth of "understanding" needed to be a really good at this.
There are fundamental limitations to LLMs though - a limit to what can be learned by training a system to predict next word form a fixed training corpus. It can get REALLY good at that task, as we've seen, to extent that it's not just predicting next word but rather predicting an entire continuation/response that is statistically consistent with the training set. However, what is fundamentally missing is any grounding in anything other than the training set, which is the what causes hallucinations/bullshitting. In a biological intelligent system predicting reality is the goal, not just predicting what "sounds good".
LLMs are a good start in as much as they prove the power of prediction as a form of feedback, but to match biological systems we need a closed-loop cognitive architecture that can predict then self-correct based on mismatch between reality and prediction (which is what our cortex does).
For all of the glib prose that an LLM can generate, even if it seems to understand what you are asking (after all, it was trained with the goal of sounding good), it doesn't have the intelligence of even a simple animal like a rat that doesn't use language at all, but is grounded in reality.
> even if it seems to understand what you are asking (after all, it was trained with the goal of sounding good
It was trained not only to "sound good" aesthetically but also to solve a wide range of NLP tasks accurately. It not only "seems to" understand the prompt but it actually does have a mechanical understanding of it. With ~100 layers in the network it mechanically builds a model of very abstract concepts at the higher layers.
> it doesn't have the intelligence of even a simple animal
It has higher intelligence than humans by some metrics, but no consciousness.
I read that paper back in the day and honestly I don't find it very meaningful.
What they find is that for every emerging ability where an evaluation metric seems to have a sudden jump, there is some other underlying metric that is continuous.
The thing is that the metric with the jump is the one people would actually care about (like actually being able to answer questions correctly, etc.) while the continuous one is an internal metric. I don't think that refutes the existence of emerging abilities, it just explains a little bit of how they arise.
Why would it not? We've observed them getting significantly better through multiple iterations. It is quite possible they'll hit a barrier at some point, but what makes you believe this iteration will be the point where the advanced stop?
Humans and other animals definitely different when it comes to reasoning. At the same time, biologically humans and many other animals are very similar, when it comes to brain, but humans have more "processing power". So it's only natural to expect some emergent properties from increasing number of parameters.
> it’ll just stop being a stochastic parrot in just a few more million parameters.
Is is not a stochastic parrot today. Deep learning models can solve problems, recognize patterns, and generate new creative output that is not explicitly in their training set. Aside from adding more parameters there are new neural network architectures to discover and experiment with. Transformers aren't the final stage of deep learning.
Probabilistically serializing tokens in a fashion that isn't 100% identical to training set data is not creative in the context of novel reasoning. If all it did was reproduce its training set it would be the grossest example of overfitting ever, and useless.
Any actually creative output from these models is by pure random chance, which is most definitely different from the deliberate human reasoning that has produced our intellectual advances throughout history. It may or may not be inferior: there's a good argument to be made that "random creativity" will outperform human capabilities due to the sheer scale and rate at which the models can evolve, but there's no evidence that this is the case (right now).
There is also no evidence for your conjecture about there being some sort of grand distinction between "probabilistically serializing tokens" and "deliberate human reasoning" other than scale. There might be, but there is no evidence.
Ever heard of something called diminishingly returns?
The value improvement between 17.5b parameters and 175b parameters is much greater than the value improvement between 175b parameters and 18t parameters.
IOW, each time we throw 100 times more processing power at the problem, we get a measly 2 time increase in value.
You are missing the point that it can be a model limit. LLMs were a breakthrough but that doesn’t mean they are a good model for some other problems, no matter the number of parameters. Language contains more than we thought, as GPT has impressively showed (ie semantics embedded in the syntax emerging from text compression), but still not every intellectual process is language based.
You were talking about the number of parameters on existing models. Like the history of Deep Learning has shown, simply throwing more computing power at an existing approach will plateau and not result in a fundamental breakthrough. Maybe we'll find new architectures, but the point was that the current ones might be showing their limits, and we shouldn't expect the model suddenly become good at something they are currently unable to handle because "more parameters".
Yes you're right I only mentioned the size of the model. The rate of progress has been astonishing and we haven't reached the end, in terms of both of size and algorithmic sophistication of the models. There is no evidence that we have reached a fundamental limit of AI in the context of deep learning.
Indeed. LLM is an application on a transformer trained with backpropagation. What stops you from adding a logic/mathematic "application" on the same transformer?
Early attempts at human flight approached it by strapping wings to people's arms and flapping: Do you think that would have eventually worked too, if only we had just given it a bit more time and faith?