Hacker News new | ask | show | jobs
by illiarian 1156 days ago
LLMs currently statistically regurgitate existing data. An LLM in 1600s would tell you that a house layout is "rooms connected to each other" because that would be its pre-existing data. It remains to be seen if LLMs can come up with "oh wait? we can create a passageway, and have rooms open into that" based on satistical models of pre-existing data.

Can it come up with a corridor when it has no idea that such a concept exists? That remains to be seen.

3 comments

> LLMs currently statistically regurgitate existing data.

NO! They do not.

Deep learning models are "universal approximators". Any two-layer neural network with enough parameters, data and training is a universal approximation. That means they can learn ANY relationship with an arbitrary accuracy.

Going beyond two layers, with several layers, problem domain structured architectures, and recurrent connections, they become far more efficient and effective.

So yes, they learn associations, correlations, stochastic models, statistics.

But they also learn to model functional relationships. Which is why they are able to generalize relationship to new situations, and combine previously unrelated relationships in reasonable and surprising ways.

A large part of creativity is putting together previously unrelated concepts and then letting the obvious logic of those relationships combine to result in something new an unexpected.

Note that both combining normally unrelated things, and combining the concepts in some way more or less consistent with what those concepts normally mean, is well within the grasp of current models.

They haven't outclassed out best thinkers. Or any of our best thinking as individuals yet. They are still very limited on problems that require many steps to think through.

But they are definitely, within their limits, being creative.

And they are far, far, FAR from just being statistical parrots.

> and combine previously unrelated relationships in reasonable and surprising ways.

We've yet to see those surprisng ways despite all the claims.

Note: what they do already is amazing and surprising in itself (such as "write me a song about quantum physics suitable for a 5 year old"). It's still very much shy of "hey there's this new thing previously unthought of".

> We've yet to see those surprisng ways despite all the claims.

This is the one reason everyone is finding them fascinating.

Perhaps you find them boring. Rote. Or something. But the reason non-technical people, as well as technical people, are enjoying and learning by interacting with chat and other models is how often the results are interesting.

I asked ChatGPT-4 to create a Dr. Seuss story about Cat in the Hat and my green conure parrot Teansy, that involved sewing and Italy. It produced a wonderful story of how they met in Italy, became friends, encountered a homeless child with a threadbare blanket and help the child. Then began helping others and ended up creating a fashion design studio.

All written in Dr. Seuss prose that made for a perfect children's book.

Pretty creative.

I then asked GPT to continue the story, but as a James Bond novel where one of Teansy's mysterious clients was actually a criminal using the fashion industry to hide his nefarious practices, and that Teansy should help James Bond solve the case.

For that I got another great story, completely consistent to James Bond tropes. It came up with a story line where the fashion industry was used to launder blood diamonds, which I thought was brilliant. A perfectly good rational for a James Bond villain. The story was great.

Throughout, Chat threw in funny suitable mentions about Teansy's fashion focuses, including feather lined wear, etc.

And all this creativity in a first draft written as fast as I could read it.

A year ago, nothing on the planet but a whimsical human with too much time (more time than it took Chat), on their hands could do this.

--

Obviously, we are discovering Chat can perform far more complex behaviors.

Act as any agent we describe including computer systems, or the internet. Respond quickly to feedback. Form plans. Learn and summarize the grammar of small artificial languages fairly well just from examples, ...

Without interacting with these models we would never have declared these were expected behaviors.

So I don't know what basis the emergence of these behaviors isn't surprising. Hoped for, envisioned, sure. But hardly an expression of obviously predetermined designed-in capabilities.

This is all interpolation between existing concepts. It is not a counterexample.
> This is all interpolation between existing concepts.

Interpolating sounds like a simple task.

But whether it is depends entirely on the data. Simple data will result in a simple interpolating model.

But complex data requires complex relationships to be learned.

Calling a complex model just an interpolator is like saying human beings are just another bag of atoms doing what atoms do. Technically correct, but missing the significance of humans.

It also isn’t really clear to me that humans aren’t also interpolating between complex existing concepts when we come up with novel thoughts or ideas. Our minds are complex, our pre-existing knowledge base is complex. It’s impossible to know if our unique thoughts aren’t really some complex amalgamation of other thoughts we already have in there somewhere, perhaps a mashup of seemingly unrelated thoughts that just happen to lie closely in the multidimensional space of ideas to the thing we are thinking about. Sounds potentially similar to a complex LLM then, really.
> Can it come up with...when it has no idea that such a concept exists?

I was curious to see if it could.

=== Q. Invent a new statistic that mixes hessian and likelihood with score.

gpt: One potential new statistical concept that mixes Hessian and likelihood with the score function could be called "Hessian-score likelihood ratio".

The idea behind this concept is to use the Hessian matrix, which captures second-order information about the likelihood function, along with the score function, which measures the sensitivity of the likelihood function to changes in the model parameters, to construct a likelihood ratio test statistic.

Q. How do I compute this new statistic ?

gpt. To calculate this statistic, we first compute the Hessian matrix of the log-likelihood function evaluated at the maximum likelihood estimate of the parameters. We then calculate the score function evaluated at the maximum likelihood estimate of the parameters.

We can then use these quantities to construct a test statistic that compares the likelihood of the model with the estimated parameters to the likelihood of the model with an alternative set of parameters. Specifically, we take the difference between the log-likelihoods of the two models, and divide by the square root of the determinant of the Hessian matrix, multiplied by the score function.

Q. Of what use is this new statistic ?

gpt. The resulting test statistic can be used to test hypotheses about the parameters of the model, such as whether they are equal to a particular value or whether they differ across groups or conditions. The Hessian-score likelihood ratio offers a potentially more powerful and efficient way to test such hypotheses than existing methods, since it makes use of both second-order information about the likelihood function and the sensitivity of the likelihood to changes in the parameters. ====

I am not able to spot any obvious inaccuracies in the above.

>> Can it come up with...when it has no idea that such a concept exists?

> I was curious to see if it could.

But you told it about the concept, "a new statistic that mixes hessian and likelihood with score".

You should try a different experiment. I'm more familiar with architecture than statistics so I'll use the floor plan example. Were someone in 16th century had asked its LLM to address the painpoints of the joined room approach, and then the LLM conceived of the novell concept of a corridor. [Look up the origin of the word.]

If that 16th century LLM spat out "the overall concept is to distinguish between transient and in repose spaces. There is already something similar in military architecture called 'corridor', which is a strip of land along the outer edge of a ditch. In these new floor plan designs, there will be corridors internal and peripheral to the building that will connect rooms, just like corridors (strips of land) connect lots or permit movement without stepping into a ditch".

-- can this happen? --

You: Invent a novel test statistic that can be used to test hypotheses about the parameters of the model, such as whether they are equal to a particular value or whether they differ across groups or conditions. You can combine existing statistical tools.

gpt: How about "Hessian-score likelihood ratio"? The idea behind this concept is to use the Hessian matrix, which captures second-order information about the likelihood function, along with the score function, which measures the sensitivity of the likelihood function to changes in the model parameters, to construct a likelihood ratio test statistic.

This is interpolation. And more than that, your prompt is the source of the actual novelty, little as it is.
> LLMs currently statistically regurgitate existing data.

This is clearly not true in any meaningful sense - c.f. the Othello paper, examples from the top of this very comment thread, etc.

> Can it come up with a corridor when it has no idea that such a concept exists?

Unless I'm missing something, the person I replied to is claiming that it categorically cannot come up with a concept it hasn't been trained on. I'm disagreeing - if a model knows about rooms and doors and floorplans, there's no obvious reason why it mightn't think up an arrangement of those things that would be novel to the people who trained it. If you think the matter remains to be seen, then I'm not sure what you disagree with me about.

In my experience, it can certainly be coaxed into discussing novel concepts that transcend existing knowledge. I'm having fun getting it to explain what a hybrid of a Nelson Enfilade data structure combined with a tensegrity data structure is and if that system is novel and brings any benefits, very interesting and novel afaik.
It seems like every time someone says that it's doing something novel, they present an example of interpolation between existing concepts.

This is useful, but the source of novelty here is the prompt; the rest is the work of interpolation.

This is all very reminiscent of image generation. There too, novelty is limited to interpolation.

yes, but isn't that in itself novel, what is it that you want the system to do?
> if a model knows about rooms and doors and floorplans, there's no obvious reason why it mightn't think up an arrangement of those things that would be novel to the people who trained it.

Once again, you're missing the point.

In 16th century people also knew about floors, and rooms, and floorpalns. And yet, the first architect to use a coridor used it for the first time in 1597.

What other "corridors" are missing from LLMs' training data? And we're sure it can come up with such a missing concept?

The Othello paper and the examples (are you referring to the example of coming up with new words?) are doing the same thing: they feed the model well-defined pre-established rules that can be statistically combined. The "novel ideas" are not even nearly novel because, well, they follow the established rules.

Could the model invent reversi/othello had it not known about it beforehand? Could the model invent new words (or a new language) had it not known about how to do that beforehand (there's plenty of research on both)? Can it satisfactorily do either even now (for some definition of satisfactorily)?

People believe it can only because the training set is quite vast and the work done is beyond any shadow of the doubt brilliant. That is why the invention of new words seems amazing and novel to many people while others even with a superficial armchair knowledge of linguistics are nonplussed. And so on.

> Could the model invent reversi/othello had it not known about it beforehand?

You've practically restated the paper's findings! :D The LLM knew nothing about othello; it wasn't shown any rules to be recombined. It was shown only sequences of 60 distinct tokens - effectively sentences in an unknown language. The LLM then inferred a model to predict the grammar of that language, and the authors demonstrated that its model functioned like an othello board.

> You've practically restated the paper's findings! :D The LLM knew nothing about othello; it wasn't shown any rules to be recombined.

Literal quote from the paper:

"As a first step, we train a language model (a GPT variant we call Othello-GPT) to extend partial game transcripts (a list of moves made by players) with legal moves."

And then:

"Nonetheless, our model is able to generate legal Othello moves with high accuracy".

So:

- it knows about the game because it was literally shown the game with only the legal moves

- it doesn't produce legal moves all the time (even though it does so with high accuracy)

That's why I say "the work done is beyond any shadow of the doubt brilliant". Because this is a definite leap forward from the status quo. However, it doesn't imply that the models can invent/predict/come up with novel ways of doing something. This is still strictly within the realm of "given existing data, give back a statistically relevant response".

Could it actually invent Reversi/Othello had it not known about it beforehand?

> it was literally shown the game with only the legal moves

It's shown token sequences only. It has no idea they represent a game, or that the game has legal and illegal moves. And more importantly, it has no idea that each token modifies the state of a gameboard, or that simulating how that gameboard changes after every token is the only way to understand the token's grammar. It invents all that.

> Could it actually invent Reversi/Othello had it not known about it beforehand?

You mean, could an LLM invent othello even if its training material made no mention of the game or its rules? Presumptively, of course - why not? Suppose you go make up an arbitrary board game right now. If you then ask ChatGPT-4 to invent a boardgame of its own, nothing excludes the possibility that it will describe a game isomorphic to yours. Obviously the odds are very low, but why imagine that it's not possible?

You're presenting an example of inference of rules from given data as a counterexample for novelty. They're not even in the same category of thing. Invention is not learning. Sometimes invention is interpolation, but sometimes it isn't: corridors is an interesting example, because they are not obviously a remix of anything.