Hacker News new | ask | show | jobs
by wbhart 942 days ago
People have done experiments trying to get GPT-4 to come up with viable conjectures. So far it does such a woefully bad job that it isn't worth even trying.

Unfortunately there are rather a lot of issues which are difficult to describe concisely, so here is probably not the best place.

Primary amongst them is the fact that an LLM would be a horribly inefficient way to do this. There are much, much better ways, which have been tried, with limited success.

1 comments

After a year the entire argument you make boils down to “so far”.
Whereas your post sounds like "Just give the approach more time, it shall continue to incrementally improve until it finally works someday, cuz reasons."

Early attempts at human flight approached it by strapping wings to people's arms and flapping: Do you think that would have eventually worked too, if only we had just given it a bit more time and faith?

> Early attempts at human flight approached it by strapping wings to people's arms and flapping: Do you think that would have eventually worked too, if only we had just given it a bit more time and faith?

Interestingly, we how have human powered aircraft... We have flown ~60km with human leg power alone. We've also got human powered ornithopters (flapping wing designs) which can fly but only for very short times before the pilot is exhausted.

I expect that another 100 years from now, both records will be exceeded, altough probably for scientific curiosity more than because human powered flight is actually useful.

I knew about the legs (there was a model in the London Science Museum when I was a kid), but I didn't know about the ornithopter.

https://en.wikipedia.org/wiki/UTIAS_Snowbird

13 years ago! Wow, how did I miss that?

> Just give the approach more time, it shall continue to incrementally improve until it finally works someday, cuz reasons

Yes. Because we haven't yet reached the limit of deep learning models. GPT-3.5 has 175 billion parameters. GPT-4 has an estimated 1.8 trillion parameters. That was nearly a year ago. Wait until you see what's next.

Why would adding more parameters suddenly make it better at this sort of reasoning? It feels a bit of a “god of the gaps” where it’ll just stop being a stochastic parrot in just a few more million parameters.
I don't think it's guaranteed, but I do think it's very plausible because we've seen these models gain emerging abilities at every iteration, just from sheer scaling. So extrapolation tells us that they may keep gaining more capabilities (we don't know how exactly it does it, though, so of course it's all speculation).

I don't think many people would describe GPT-4 as a stochastic parrot already... when the paper that coined (or at least popularized) the term came up in early 2021, the term made a lot of sense. In late 2023, with models that at the very least show clear signs of creativity (I'm sticking to that because "reasoning" or not is more controversial), it's relegated to reductionistic philosophical arguments, but not really a practical description anymore.

I don’t think we should throw out the stochastic parrot so easily. As you say there are “clear signs of creativity” but that could be it getting significantly better as a stochastic parrot. We have no real test to tell mimicry apart from reasoning and as you note we also can only speculate about how any of it works. I don’t think it’s reductionist in light of that, maybe cautious or pessimistic.
> very least show clear signs of creativity

Do you know how that “creativity” is achieved? It’s done with a random number generator. Instead of having the LLM pick the absolute most likely next token, they have it select from a set of most likely next tokens - size of the set depends on “temperature”.

Set temperature to 0, and the LLM will talk in circles and not really say anything interesting. Set it too high and it will output nonsense.

The whole design of LLMs don’t seem very well thought out. Things are done a certain way not because it makes sense but because it seems to produce “impressive” results.

People use the term "stochastic parrot" in different ways ... some just as a put-down ("it's just autocomplete"), but others like Geoff Hinton acknowledging that there is of course some truth to it (an LLM is, at the end of the day, a system who's (only) goal is to predict "what would a human say"), while pointing out the depth of "understanding" needed to be a really good at this.

There are fundamental limitations to LLMs though - a limit to what can be learned by training a system to predict next word form a fixed training corpus. It can get REALLY good at that task, as we've seen, to extent that it's not just predicting next word but rather predicting an entire continuation/response that is statistically consistent with the training set. However, what is fundamentally missing is any grounding in anything other than the training set, which is the what causes hallucinations/bullshitting. In a biological intelligent system predicting reality is the goal, not just predicting what "sounds good".

LLMs are a good start in as much as they prove the power of prediction as a form of feedback, but to match biological systems we need a closed-loop cognitive architecture that can predict then self-correct based on mismatch between reality and prediction (which is what our cortex does).

For all of the glib prose that an LLM can generate, even if it seems to understand what you are asking (after all, it was trained with the goal of sounding good), it doesn't have the intelligence of even a simple animal like a rat that doesn't use language at all, but is grounded in reality.

You can predict performance of certain tasks before training and it's continuous:

https://twitter.com/mobav0/status/1653048872795791360

Why would it not? We've observed them getting significantly better through multiple iterations. It is quite possible they'll hit a barrier at some point, but what makes you believe this iteration will be the point where the advanced stop?
Because for what we’re discussing it would represent a step change in capability not an incremental improvement as we’ve seen.
Humans and other animals definitely different when it comes to reasoning. At the same time, biologically humans and many other animals are very similar, when it comes to brain, but humans have more "processing power". So it's only natural to expect some emergent properties from increasing number of parameters.
> it’ll just stop being a stochastic parrot in just a few more million parameters.

Is is not a stochastic parrot today. Deep learning models can solve problems, recognize patterns, and generate new creative output that is not explicitly in their training set. Aside from adding more parameters there are new neural network architectures to discover and experiment with. Transformers aren't the final stage of deep learning.

Probabilistically serializing tokens in a fashion that isn't 100% identical to training set data is not creative in the context of novel reasoning. If all it did was reproduce its training set it would be the grossest example of overfitting ever, and useless.

Any actually creative output from these models is by pure random chance, which is most definitely different from the deliberate human reasoning that has produced our intellectual advances throughout history. It may or may not be inferior: there's a good argument to be made that "random creativity" will outperform human capabilities due to the sheer scale and rate at which the models can evolve, but there's no evidence that this is the case (right now).

Ever heard of something called diminishingly returns?

The value improvement between 17.5b parameters and 175b parameters is much greater than the value improvement between 175b parameters and 18t parameters.

IOW, each time we throw 100 times more processing power at the problem, we get a measly 2 time increase in value.

Yes that's a good point. But the algorithms are improving too.
You are missing the point that it can be a model limit. LLMs were a breakthrough but that doesn’t mean they are a good model for some other problems, no matter the number of parameters. Language contains more than we thought, as GPT has impressively showed (ie semantics embedded in the syntax emerging from text compression), but still not every intellectual process is language based.
I know that, but deep learning is more than LLMs. Transformers aren't the final ultimate stage of deep learning. We haven't found the limit yet.
You were talking about the number of parameters on existing models. Like the history of Deep Learning has shown, simply throwing more computing power at an existing approach will plateau and not result in a fundamental breakthrough. Maybe we'll find new architectures, but the point was that the current ones might be showing their limits, and we shouldn't expect the model suddenly become good at something they are currently unable to handle because "more parameters".
Indeed. LLM is an application on a transformer trained with backpropagation. What stops you from adding a logic/mathematic "application" on the same transformer?
Nothing, and there are methods which allow these types of models to learn to use special purpose tools of this kind[1].

[1] https://arxiv.org/abs/2302.04761 Toolformer: Language Models Can Teach Themselves to Use Tools