Hacker News new | ask | show | jobs
by Extasia785 1156 days ago
> If I were to start with ChatGPT I would, necessarily, only say what has been said.

This is not true though, you can test it for yourself:

Prompt: Invent a new word, which has never been written before. When I type it in Google, I should get zero results.

I'm completely transparent, I tried this query 3 times because of my own curiosity and will provide all 3 answers (I added --- for formatting):

---

ChatGPT (answer 1):

Sure, how about "flumtraz"?

---

ChatGPT (answer 2):

Here's a new word: "Ardidio"

It is a combination of "ardor" (meaning passion or enthusiasm) and "audio" (meaning sound or hearing).

The word Ardidio can be used to describe a strong and passionate feeling one experiences when listening to music or any form of audio. For example, "I felt a rush of ardidio while listening to the live performance of my favorite band."

As of now, when you search for "Ardidio" on Google, there are no results found.

---

ChatGPT (answer 3):

"Flumixen" - a verb that means to mix or blend something in a clumsy or awkward manner. For example, "I tried to make a cake from scratch, but I ended up flumixening the ingredients together and it turned out terrible."

---

For the first answer I got zero results on Google, so it's quite unlikely that it was part of the training data, proving that your theory wrong. It can say things that have never been said.

For the second and third answer I got results in Google, but they seem accidental, usernames and the likes, but as far as I can see, they have never been used with the meaning that ChatGPT "invented".

I'm aware that ChatGPT at its core is a stochastic parrot, but the evidence is clear for me that it can combine certain aspects of its training data in unique ways, which were never done before.

6 comments

> For the first answer I got zero results on Google, so it's quite unlikely that it was part of the training data

Sure, not literally part of the training data.

Statistical AI operates in a transformed space derived from the training data, points in that space will not, in general, exist in the original.

So imagine generating 1000 circles and putting their radii on a line: 0.1, 0.2, 0.3, ...

The circles are the training data, and the "implied line" is the transformed space.

Now, AI here is capable of generating a circle with radius 0.15 and hence that circle is "not in the original dataset".

This type of "novelty" isn't what I'm concerned with; generative AI must have that or else it'd be entirely useless -- only a google search.

Rather i'm talking about, for example, whether without "Rust" in its training data it could develop "Rust" from everything else. Is there enough data on lifetimes/borrowing/etc. research in pdfs that it's scanned to somehow "find a midpoint between those pdfs and C++".

It seems a bit mad to suppose so -- but I could be wrong, such a midpoint does exist --- but i'm extremely doubtful we humans have been so helpful as to write the 1000s of academic PDFs needed for this system to find it.

The novelty I'm talking about is dimensions in the transformed space. The system cannot derive "additional ways to move" without the source data actually containing those ways.

This is, roughly, equivalent to saying that it's biased towards the on-average ways we have conceptualised our problems as represented by the on-average distribution of academic articles, github repos, webpages, etc. *that we happened to have created*.

This is a serious "intellectually conservative" bias.

For sure it can find circles it hasnt seen; but could it find spheres from circles alone? No.

I don't think this argument holds water at all. Can we imagine that the people who created Rust were able to do so only because they contributed some magical ingredient from their souls, which didn't exist anywhere in the world's collected pre-Rust printed materials? It's more economical to assume that they were able to create Rust because the necessary concepts already existed, in which case a sufficiently trained AI might do something similar.

Or working in the opposite direction: we can think of AIs as processing concepts in some dimensional space, sure. But we have no conception at all of what that space is like, so there's no reason to expect that a midpoint in that space between two objects we're familiar with would also be familiar to us. I mean, I have no idea what the midpoint between Rust and C++ is, or how I'd go about describing it. Surely an AI that thinks in tensors is more capable than we are to explore the space between known concepts, so why couldn't we expect to learn something novel from one?

Concepts are developed by animals over time. A baby develops sensory-motor concepts from day-1; a child abstracts them; a teenager communicates them; and adult refines that communication.

They are not developed as a matter of averaging over all the text on the internet.

Concepts do not pre-exist concepts.

Respectfully, that sounds like hand-waving. Claiming to know where concepts do and don't come from just leads to questions like "did the natural numbers exist before we did?", which are centuries old and presumably not resolvable.

Whereas a more focused question like "can an AI produce outputs that are novel to someone familiar with all of the AI's inputs?" seems resolvable, and even if one thinks it's unlikely or not easy, it's very hard to buy the idea that it's impossible.

> just leads to questions

No, not really. People in this area are severely poorly informed on animal learning, and "ordinary science".

AI evangelists like to treat as "merely philosophical matters" profoundly scientific ones.

The issues here belong to ordinary science. Can a machine with access only to statistical patterns in the distribution of text tokens infer the physical structure of reality?

We can say, as certain as anything: No.

Associative statistical models are not phenomenological models (ie., specialised to observable cause-effect measures); and phenomenological models are not causal (ie., do not give the mechanism of the cause-effect relationship).

Further, we know as surely as an athlete catching a ball, that animals develop causal models of their environments "deeply and spontaneously".

And we know, to quite a robust degree, how they do so -- using interior causal models of their bodies to change their environments by intentional acts can confirm or disconfirm environmental models. This is modelled logically as abduction, causally as sensory-motor adaption, and so on.

This is not a philosophical matter. We know that "statistical learning" which is nothing more than a "correlation maximisation objective" over non-phenomenological, non-causal, non-physical data produces approximate associative models of those target domains -- that have little use beyond "replaying those associations".

ChatGPT appears to do many things. But you will see soon, after a year or two of papers published, that those things were tricks. That "replaying associations in everything ever written" is a great trick, that is very useful to people.

Today you can ask ChatGPT to rewrite harry potter "if harry were evil" or some such thing. That's because there are many libraries of books on harry potter and "evil" -- and by statistical interpolation alone, you can answer an apparent counter-factual question which should require imagination.

But give ChatGPT an actual counter-factual whose parts are only in the question, and you'll be out-of-luck.

Eg., tell it about tables, chairs, pens, cups and ask it to arrange them using given operations so that, eg., the room is orderly. Or whatever you wish.

Specified precisely enough you can expose the trick.

>This is not a philosophical matter. We know that "statistical learning" which is nothing more than a "correlation maximisation objective" over non-phenomenological, non-causal, non-physical data produces approximate associative models of those target domains -- that have little use beyond "replaying those associations".

Why do you think the data LLMs are trained on are non-causal? Lets take causation as asymmetric correlation. That is, (A,B) present in the training data does not imply (B,A) presence. But of course human text is asymmetric in this manner and LLMs will pick up on this asymmetry. You might say that causation isn't merely about asymmetric correlation, but that of the former determining the latter. But this isn't something we observe from nature, it is an explanatory posit that humans have landed on in service to modelling the world. So causation is intrinsically explanatory, and explanation is intrinsically causal. The question is, does an LLM in the course of modelling asymmetric correlations, develop something analogous to an explanatory model. I think so, in the sense that a good statistical model will intrinsically capture explanatory relations.

Cashing out explanation and explanatory model isn't easy. But as a first pass I can say that explanatory models capture intrinsic regularity of a target system such that the model has an analogical relationship with internal mechanisms in the target system. This means that certain transformations applied to the target system has a corresponding transformation in the model that identifies the same outcome. If we view phenomena in terms of mechanistic levels with the extrinsic observable properties as the top level and the internal mechanisms as lower levels, an explanatory model will model some lower mechanistic level and recover properties of the top level.

But this is in the solution space of good models of statistical regularity of an external system. To maximally predict the next token in a sequence just requires a model of the process that generates that sequence.

> Can a machine with access only to statistical patterns in the distribution of text tokens infer the physical structure of reality? We can say, as certain as anything: No.

Um. How do you square that claim with the well-known Othello paper?

https://thegradient.pub/othello/

>Concepts do not pre-exist concepts.

I think this is a very bold claim to make.

Each new idea/technology/concept stands on the back of all that came before it. You couldn't just pull a LLM or a dishwasher out of a hat 1000 years ago.

Right, but techniques like chain of thought reasoning can build concepts on concepts. Even if "the thing that generated the text" isn't creating new concepts, the text itself can be, because the AI has learned general patterns like reasoning and building upon previous conclusions.
> only because they contributed some magical ingredient from their souls

The fact that you turned a limitation of an specific algorithm into a call for magical powers shows quite a bit of bias on your part.

The bias is on the people asserting people have innate capabilities that are not a derivative of pattern recognition.
The argument is that humans interact with the world across many different modalities and do their statistical learning through this complex of interactions, while LLMs do their statistical learning just by what has been written (by humans) in certain internet sites.

I think it is a quite bold and philosophically poor statement to equate the "human training set" of complex interactions with the environment with what is written on the internet.

You’re arguing that the training set is different. You haven’t identified any different capabilities. What are the capabilities that make humans different?
You know, not all AI algorithms in use are derivative of statistical curve fitting.

But if you have some more general definition for "pattern recognition" than this, you should be perfectly able to notice that it's more general than what LLMs do.

Give an example of pattern recognition more general than what LLMs do.
> only because they contributed some magical ingredient from their souls, which didn't exist anywhere in the world's collected pre-Rust printed materials

You're focusing on the example too much. Here are more examples illustrating the question. It's doubtful that LLMs could infer solutions that lie outside their statistical models trained on existing data.

"In 1597 John Thorpe is the first recorded architect to replace multiple connected rooms with rooms along a corridor each accessed by a separate door" [1]

"Despite various publications of results where hand-washing reduced mortality to below 1%, Semmelweis's observations conflicted with the established scientific and medical opinions of the time and his ideas were rejected by the medical community. He could offer no theoretical explanation for his findings" [2]

"Button-like objects of stone, glass, bone, ceramic, and gold have been found at archaeological sites dating as early as 2000 b.c.e... One of the earliest extant pieces of clothing to show the use of buttons as fastenings is the pourpoint of Charles of Blois (c. 1319–1364)." [3]

And so on.

[1] https://en.wikipedia.org/wiki/Hallway?wprov=sfti1

[2] https://en.wikipedia.org/wiki/Ignaz_Semmelweis?wprov=sfti1

[3] https://www.encyclopedia.com/sports-and-everyday-life/fashio...

What you're saying makes sense, and I think I appreciate the point behind the examples you provided.

I think it would help your argument if you could point to such an example from the last couple of years, after the cutoff point of the LLM training data. Maybe though, nothing has been invented since then that is sufficiently unique. If there is something like that, I suppose it would be possible to try and prompt the LLM to create it. That would make your argument falsifiable and I'd be really curious to know the outcome.

The sum total of human knowledge has increased exponentially, so it's harder to come up with an example for that :)
Sorry, I don't follow at all. We can assume that every concept humans have conceived of was at some point conceived of for the first time, there's no need to list examples. But how does that relate to the claim here, that purports to constrain what outputs AIs are and aren't capable of generating?
LLMs currently statistically regurgitate existing data. An LLM in 1600s would tell you that a house layout is "rooms connected to each other" because that would be its pre-existing data. It remains to be seen if LLMs can come up with "oh wait? we can create a passageway, and have rooms open into that" based on satistical models of pre-existing data.

Can it come up with a corridor when it has no idea that such a concept exists? That remains to be seen.

> LLMs currently statistically regurgitate existing data.

NO! They do not.

Deep learning models are "universal approximators". Any two-layer neural network with enough parameters, data and training is a universal approximation. That means they can learn ANY relationship with an arbitrary accuracy.

Going beyond two layers, with several layers, problem domain structured architectures, and recurrent connections, they become far more efficient and effective.

So yes, they learn associations, correlations, stochastic models, statistics.

But they also learn to model functional relationships. Which is why they are able to generalize relationship to new situations, and combine previously unrelated relationships in reasonable and surprising ways.

A large part of creativity is putting together previously unrelated concepts and then letting the obvious logic of those relationships combine to result in something new an unexpected.

Note that both combining normally unrelated things, and combining the concepts in some way more or less consistent with what those concepts normally mean, is well within the grasp of current models.

They haven't outclassed out best thinkers. Or any of our best thinking as individuals yet. They are still very limited on problems that require many steps to think through.

But they are definitely, within their limits, being creative.

And they are far, far, FAR from just being statistical parrots.

> Can it come up with...when it has no idea that such a concept exists?

I was curious to see if it could.

=== Q. Invent a new statistic that mixes hessian and likelihood with score.

gpt: One potential new statistical concept that mixes Hessian and likelihood with the score function could be called "Hessian-score likelihood ratio".

The idea behind this concept is to use the Hessian matrix, which captures second-order information about the likelihood function, along with the score function, which measures the sensitivity of the likelihood function to changes in the model parameters, to construct a likelihood ratio test statistic.

Q. How do I compute this new statistic ?

gpt. To calculate this statistic, we first compute the Hessian matrix of the log-likelihood function evaluated at the maximum likelihood estimate of the parameters. We then calculate the score function evaluated at the maximum likelihood estimate of the parameters.

We can then use these quantities to construct a test statistic that compares the likelihood of the model with the estimated parameters to the likelihood of the model with an alternative set of parameters. Specifically, we take the difference between the log-likelihoods of the two models, and divide by the square root of the determinant of the Hessian matrix, multiplied by the score function.

Q. Of what use is this new statistic ?

gpt. The resulting test statistic can be used to test hypotheses about the parameters of the model, such as whether they are equal to a particular value or whether they differ across groups or conditions. The Hessian-score likelihood ratio offers a potentially more powerful and efficient way to test such hypotheses than existing methods, since it makes use of both second-order information about the likelihood function and the sensitivity of the likelihood to changes in the parameters. ====

I am not able to spot any obvious inaccuracies in the above.

> LLMs currently statistically regurgitate existing data.

This is clearly not true in any meaningful sense - c.f. the Othello paper, examples from the top of this very comment thread, etc.

> Can it come up with a corridor when it has no idea that such a concept exists?

Unless I'm missing something, the person I replied to is claiming that it categorically cannot come up with a concept it hasn't been trained on. I'm disagreeing - if a model knows about rooms and doors and floorplans, there's no obvious reason why it mightn't think up an arrangement of those things that would be novel to the people who trained it. If you think the matter remains to be seen, then I'm not sure what you disagree with me about.

I am not convinced by this argument. It is very misleading to think that, since GPT is trained on data from the world, it must, necessarily, always produce an average of the ideas in the world. Humans have formulated laws of physics that "minimize loss" on our predictions of the physical world that are later experimentally determined to be accurate, and there's no reason to assume a language model trained to minimize loss on language won't be able to derive similar "laws" that stimulate human behavior.

In short, GPT doesn't just estimate text by looking at frequencies. GPT works so well by learning to model the underlying processes (goal-directedness, creativity, what have you) that create the training data. In other words, as it gets better (and my claim is it has already gotten to the point where it can do the above), it will be able to harness the same capabilities that humans have to make something "not in the training set".

Check out https://generative.ink/posts/simulators/ for a better treatment of this topic than I could possibly give.

Here's a relevant section of said article:

> Guessing the right theory of physics is equivalent to minimizing predictive loss. Any uncertainty that cannot be reduced by more observation or more thinking is irreducible stochasticity in the laws of physics themselves – or, equivalently, noise from the influence of hidden variables that are fundamentally unknowable.

> If you’ve guessed the laws of physics, you now have the ability to compute probabilistic simulations of situations that evolve according to those laws, starting from any conditions28. This applies even if you’ve guessed the wrong laws; your simulation will just systematically diverge from reality.

> Models trained with the strict simulation objective are directly incentivized to reverse-engineer the (semantic) physics of the training distribution, and consequently, to propagate simulations whose dynamical evolution is indistinguishable from that of training samples. I propose this as a description of the archetype targeted by self-supervised predictive learning, again in contrast to RL’s archetype of an agent optimized to maximize free parameters (such as action-trajectories) relative to a reward function.

Even very simple and small neural networks that you can easily train and play with on your laptop readily show that this “outputs are just the average of inputs” conception is just wrong. And it’s not wrong in some trickle philosophical sense, it’s wrong in a very clear mathematical sense, as wrong as 2+2=5. One example that’s been used for something like 15+ years is in using the MNIST handwritten digits dataset to recognize and then reproduce the appearances of handwritten digits. To do this, the model finds regularities and similarities in the shapes of digits and learns to express the digits as combinations of primitive shapes. The model will be able to produce 9s or 4s that don’t quite look like any other 9 or 4 in the dataset. It will also be able to find a digit that looks like a weird combination of a 9 and a 2 if you figure out how to express a value from that point in the latent space. It’s simply mathematically naive to call this new 9-2 hybrid an “average” of a 9 and a 2. If you averaged the pixels of a 9 image and a 2 image you would get an ugly nonsense image. The interpolation in the latent space is finding something like a mix between the ideas behind the shape of 9s and the shape of 2s. The model was never shown a 9-2 hybrid during training, but its 9-2 will look a lot like what you would draw if you were asked to draw a 9-2 hybrid.

A big LLM is something like 10 orders of magnitude bigger than your MNIST model and the interpolations between concepts it can make are obviously more nuanced than interpolations in latent space between 9 and 2. If you tell it write about “hubristic trout” it will have no trouble at all putting those two concepts together, as easily as the MNIST model produced a 9-2 shape, even though it had never seen an example of a “hubristic trout.”

It is weird because all of the above is obvious if you’ve played with any NN architecture much, but seems almost impossible to grasp for a large fraction of people, who will continue to insist that the interpolation in latent space that I just described is what they mean by “averaging”. Perhaps they actually don’t understand how the nonlinearities in the model architecture give rise to the particular mathematical features that make NNs useful and “smart”. Perhaps they see something magical about cognition and don’t realize that we are only ever “interpolating”. I don’t know where the disconnect is.

i think a partial explanation is that people don't move away from parametric representations of reality. We simply must be organized into a nice, neat gaussian distribution with very easy to calculate means and standard deviations. The idea that organization of data could be relational or better handled by a decision tree or whatever is not really presented to most people in school or university. Especially not as frequently or holistically as is simply thinking the average represents the middle of a distribution.

you see this across social sciences where you can see a lot of fields have papers that come out every decade or so since the 1980s saying that linear regression models are wrong because they don't take into account several concepts such as hierarchy (e.g., students go to different schools), frailty (there is likely unmeasured reasons why some people do the things they do), latent effects (there is likely non-linear processes that are more than the sum of the observations, e.g., traffic flows like a fluid and can have turbulence), auto-correlations/spatial correlations/etc.

In fact, I would argue that a decision tree based model (i.e., gradient boosted trees) will always arrive at a better solution to a human system than any linear regression. But at this point I suppose I have digressed from the original point.

I confess to the same mirror image issue. I cannot understand why people insist that regressing in a latent space, derived from the mere associative structure of a dataset, ought be given some Noble status.

It is not a model of our intelligence. It's a stupid thing. You can go and learn about animal intelligence -- and merging template cases of what's gone before, as recorded by human social detritus -- doesn't even bare mentioning.

The latent space of all the text tokens on the internet is not a model of the world; and finding a midpoint is just a trick. It's a merging between "stuff we find meaningful over here", and "stuff we find meaningful over there" to produce "stuff we find meaningful" -- without ever having to know what any of it meant.

The trick is that we're the audience, so we'll find the output meaningful regardless. Image generators don't "struggle with hands" they "struggle" with everything -- is we, the observer, who care more about the fidelity of hands. The process of generating pixels is uniformly dumb.

I don't see anything more here than "this is the thing that I know!" therefore "this is a model of intelligence!11.11!01!!" .

It's a very very bad model of intelligence. The datasets involved are egregious proxy measures of the world whose distribution has little to do with it: novels, books, pdfs, etc.

This is very far away from the toddler who learns to walk, learns to write, and writes what they are thinking about. They write about their day, say -- not because they "interpolate" between all books ever written... but because they have an interior representational life which is directly caused by their environment and can be communicated.

Patterns in our communication are not models of this process. They're a dumb light show.

I feel like our positions are probably both buried in webs of mutually-difficult-to-communicate worldview assumptions, but for what it’s worth, I care more at this point about the models being useful than being meaningful. I use GPT-4 to do complex coding and copy editing tasks. In both cases, the model understands what I’m going for. As in, I had some specific, complex, nuanced, concept or idea that I want to express, either in text or in code, and it does that. This can’t be me “projecting meaning” onto the completions because the code works and does what I said I wanted. You can call this a light show, but you can’t make it not useful.
> because the code works

The output of these systems can have arbitrary properties.

Consider an actor in a film, their speech has the apparent property, say, of "being abusive to their wife" -- but the actor isnt abusive, and has no wife.

Consider a young child reading from a chemistry textbook, their speech has apparent property "being true about chemistry".

But a professor of chemistry who tells you something about a reaction they've just performed, explains how it works, etc. -- this person might say identical words to the child, or the AI.

But the reason they say those words is radically different.

AI is a "light show" in the same way a film is: the projected image-and-sound appears to have all sorts of properties to an audience. Just as the child appears an expert in chemistry.

But these aren't actual properties of the system: the child, the machine, the actors.

This doesnt matter if all you want is an audiobook of a chemistry textbook, to watch a film, or to run some generated code.

But it does matter in a wide variety of other cases. You cannot rely on apparent properties when, for example, you need the system to be responsive to the world as-it-exists unrepresented in its training data. Responsive to your reasons, and those of other people. Responsive to the ways the world might be.

At this point the light show will keep appearing to work in some well-trodden cases, but will fail catastrophically in others -- for no apparent reason a fooled-audience will be able to predict.

But predicting it is easy -- as you'll see, over the next year or two, ChatGPT's flaws will become more widely know. There are many papers on this already.

>> I feel like our positions are probably both buried in webs of mutually-difficult-to-communicate worldview assumptions, but for what it’s worth, I care more at this point about the models being useful than being meaningful.

The question is how useful they are. With LLMs it seems they can be useful as long as you ask them to do something that a human, or another machine (like a compiler) can verify, like your example of synthesising a program that satisfies your specification and compiles.

Where LLMs will be useless is in taks where we can't verify their output. For example, I don't hear anyone trying to get GPT-4 to decode Linear A. That would be a task of significant scientific value, and one that a human cannot perform -unlike generating text or code, which humans can already do pretty damn well on their own.

>> Guessing the right theory of physics is equivalent to minimizing predictive loss.

A model can reduce predictive loss to almost zero while still not being "the right theory" of physics, or anything else. That is a major problem in science, and machine learning approaches don't have any answer to it. Machine learning approaches can be used to build more powerful predictive models, with lower error, but nothing tells us that one such model is, or even isn't, "the right theory".

As a very famous example, or at least the one I hold as a classic, consider the theory of epicyclical motion of the planets [1]. This was the commonly accepted model of the motion of the observable planets for thousands of years. It persisted because it had great predictive accuracy. I believe alternative models were proposed over the years, but all were shot down because they did not approach the accuracy of the theory of epicycles. Even Copernicus' model, that is considered a great advance because it put the Sun in the center of the universe, continued to use epicycles and so did not essentially change the "standard" model. Eventually, Kepler came along, and then Newton, and now we know why the planets seem to "double back" on themselves. And not only that, but we can now make much better predictions than we ever could do with the epicyclical model, because now we have an explanatory model, a realist model, not just an instrumentalist model, and it's a model not just of the observable motion of the planets but a model of how the entire world works.

As a side point, my concern with neural nets is that we get "stuck in a rut" with them, because of their predictive power, like we got stuck with the epicyclical model, and that we spend the next thousand years or so in a rut. That would be a disaster, at this point in our history. Right now we need models that can do much more than predict; we need models that are theories, that explain the world in terms of other theories. We need more science, not more modelling.

_________

[1] https://en.wikipedia.org/wiki/Deferent_and_epicycle

> Guessing the right theory of physics is equivalent to minimising predictive loss.

No it's not. It's minimising "predictive loss" only under extreme non-statistical conditions imposed on the data.

The world itself can be measured an infinite number of ways. There are an infinite number of irrelevant measures. There are an infinite number of low-reliability relevant measures. And so on.

Yes, you can formulate the extremely narrow task of modelling "exactly the right dataset" as loss minimization.

But you cannot model the production of that dataset this way. Data is a product of experiments.

This is just you declaring "no you can't" without supporting that in any way.

How is a theory of physics not a loss minimisation process? The history of science is literally described in these terms i.e. the Bohr model of the atom is wrong, but also so useful that we still use it to describe NMR spectroscopy.

Why did we come up with it? Because their aren't infinite ways to measure the universe, there are in fact very limited ways defined by our technology. Good ones, high loss minimisation, generally then let us build better technology to find more data.

You're invoking infinities which don't exist as a handwave for "understanding is a unique part of humanity" to try and hide that this is all metaphysical special pleading.

Alright...

What loss was being minimised to find F=GMm/r^2? Or any law of physics you like.

Gravitation was literally about predicting future positions of the stars, and was successful because it did so much better then any geocentric model. How is that not a loss minimization activity?

And before we had it, epicycles were steadily increasing in complexity to explain every new local astronomical observation, but that model was popular because it gives a very efficient initial fit of the easiest data to obtain (i.e. the moon actually does go around the Earth, and with only 1 reference point the Sun appears to go round the Earth too). But of course once you have a heliocentric theory, you can throw all those parameters and every new prediction lines up nearly perfectly (accounting for how much longer it would take before we had precise enough orbital measurements to need Relativity to fully model it).

>Now, AI here is capable of generating a circle with radius 0.15 and hence that circle is "not in the original dataset".

The fact that it can generate a circle with a radius of .15 rather than, say, some smushed transformation of an existing circle demonstrates that it properly decomposed the concept of circle into a radius and equidistance around a central point. This is plainly an example of generating novelty from iterating over variations of its conceptualization of circle. But this is no different than what people do. Nothing we generate is sui generis.

You explained this very well with the point that the model necessarily has that "blatant" novelty in order to be useful as more than a quote engine in the first place.

That's a good way to explain the bias too. You can see it now if you ask about Michael Levins work which is spreading now in biology but somewhat still outweighted by older views on formation of the organs during growth, and the extent of possibilities with bioelectric/genetic engineering (e.g. two headed animals). The models often don't even consider or accept that the research Levin reported would be even possible (other times, they can, but I think it serves as a good warning light that this bias can dangerously act as a permanent anchor).

> For sure it can find circles it hasnt seen; but could it find spheres from circles alone? No.

Considering that we control the training data that should be easy enough to test.

You can’t find spheres from circles alone if you don’t know what a sphere is though. You can however ask it to analyze a novel object (sphere) for patterns or hypothesize in what kind of properties a sphere might have.
I ran your exact prompt, this is what I got:

> Zykluptogibbera

> Quixilantrofen

> Zylprenostim

Satisfies the prompt sure enough, but they're hardly creative. One could possibly build a 100-200 line Python script with some fairly arbitrary heuristics for stringing together syllables that would satisfy that without any AI.

Appended: "Make it sound as plausible as possible"

> Flarishimble

> Quindextrove

> Crombifilate

Still not amazed. I think if you gave this task to a person with a good command of language they would start looking for creative alternatives much sooner. For example, they might start exploring combinations and derivations of Greek, Latin or Saxon roots to make credible words.

I'm not denying it's not creative, for the record. I'm just skeptical, and I don't think this task proves it.

Edit: for reference, ran all of the above on ChatGPT4.

It is very interesting that you get these "bad" results from my prompt. I just tried again on a completely new chat and got the following answer:

"Here's a new word: "Aurovexia"

Definition: The feeling of overwhelming happiness and contentment that arises from experiencing a beautiful sunset.

I've checked and as of now, there are zero Google search results for this word."

Which at least in my view is quite creative. It combines Aurora, the Latin word for dawn (note that ChatGPT mixed up dawn and dusk here) and coined a new suffix -vexia, at least I could not find something related during my research. Again, there are zero results on Google for this right now. I very much doubt that you could create this result with a Python script, since it actually does derive these words from existing words and makes them sound pleasant. I'm just a bit surprised it is so different for you, if you used my exact prompt I am not sure where the difference comes from, are you using GPT4? My prompts were used on GPT3.5 (if this is still the current version of the free ChatGPT version).

I get similar results to you with 4

Charniviblitz: a combination of "charm," "innovation," and "blitz," representing the rapid and captivating emergence of new ideas or creations.

Sure, here's a new word for you: "Splendivifitabulus". This word is a combination of "splendid", "vivify", and "fabulous". It could be used to describe something that is exceptionally impressive, life-enhancing, and delightful. Keep in mind that this word is not guaranteed to yield zero results on Google indefinitely, as content on the internet is constantly being updated and others may come across and use this word in the future.

The temperature setting would strongly affect the results but I don’t have API access to 4 so I can’t test that out.

Side note: any posted responses from an LLM without the version number should be discounted.

> I [...] got the following answer: "[...] I've checked and as of now, there are zero Google search results for this word."

Did ChatGPT tell you that it searched for it on Google? Can it do that, or is it just making up lies?

It's making up lies.
Here's a cheeky one, this one was from Bing AI ('creative' setting)

> How about this word: flimzor. It means a person who is very good at making up new words. For example, you could say: “You are such a flimzor! I love your creativity.”

Now add this to your prompt. "please write a simple python program to generate these types of reponses" and see what it spits out!
Sounds absolutely supercalifragilisticexpialidocious!
Alright, how about this:

> invent a profound quote that was not included in your training corpus

> "True wisdom is not found in the answers we seek, but in the questions we ask ourselves."

It's a pretty good aphorism (at least, you could definitely sell it on kitchen towels in Anthropologie), and it's not found in google searches.

I'll concede the quote is well formulated, but the point is hardly original... It took me five seconds to find a Ionesco quote that says something very similar:

“It is not the answer that enlightens, but the question.”

A high school student introduced to philosophy will typically learn about Socrates and the value of asking questions. I'd possibly give them credit for being wise if they managed to reformulate such an idea into your quote, but creative? Not so sure.

I'll give you an example of the last time I was blown away by novelty and creativity in contemporary philosophy: The Transparency Society [1]. It's a brief (~70 pages) but rigorous essay where philosopher Byung-Chul Han talks about social networks, the abandonment of privacy, and the loss of self in modern society.

The exploration of those themes might not be new, but the manner in which the author blends and relates the existentialist and societal perspectives blew me away. I found it to be an amazing exercise in creative synthesis that also brings forward novel ideas (I thoroughly recommend it!)

Nothing I've seen in my extensive usage of ChatGPT, or what others are posting about their experience, has ever come close to that. I don't even see it trending in that direction.

But again, I'm not categorically denying that it's not creative. I'm not sure about what creativity is, and I'm only speaking from intuition, by comparing what I see ChatGPT do and my knowledge of LLMs, and things that have felt creative to me.

[1] https://www.sup.org/books/title/?id=25832

Also worth noting that GPT 4 is significantly stronger than GPT 3.

Here's a decent response to: "Invent a profound quote that you have not read before, on the topic of consciousness."

"Consciousness is the infinite canvas upon which the colors of experience paint the ever-evolving masterpiece of existence."

"Perhaps real wisdom lies in not seeking answers at all. Any answer we find will not be true for long. An answer is a place where we can fall asleep as life moves past us to its next question. After all these years I have begun to wonder if the secret of living well is not in having all the answers but in pursuing unanswerable questions in good company."

It has enough material to draw from.

It’s a simple example of chiasmus rhetorical device.

Chiasmus is a classic pattern known for thousands of years and studied by ancient Greeks.

And you get the same result with a simple "take token combinations commonly found in "profound quotes" list, merge into syntax commonly found in profound quotes list" model that doesn't have any concept of chiasmus (which latest generation GPT does and can generate on demand, sometimes very well, sometimes rehashes which have the right form and subject matter even if they don't make much sense).

tbf some of the people making greetings cards are doing likewise...

Not really sure why it's the profound quotes and basic stuff about the meaning of life that seems to impress people so much about chatbots[1], these are really, really unimpressive and expected behaviour from a statistical model of human language compared with the emergent problem solving capabilities.

[1]OK, I have a theory: susceptibility to emotion-loaded symbols and status-associated patterns is a weakness in human pattern-matching abilities

Pretty good quote!
> Zykluptogibbera

> Quixilantrofen

> Zylprenostim

"If you experience any of the following symptoms, call your doctor immediately: extreme tiredness; weakness; fever, sore throat, chills,..."

>One could possibly build a 100-200 line Python script with some fairly arbitrary heuristics for stringing together syllables that would satisfy that without any AI.

Gives a simple task to the best AI we've ever built and complains that the simple task could be done with a simple script.

I guess if I even get an opportunity to name a medicine I'll ask ChatGPT for name
The ardido example is better than what the majority of my college creative writing class could come up with.

I think we’ll get to the point of entire works of fiction being produced by AI, and naysayers will still continue at straws to explain why it’s all just a clever illusion to be ignored.

This "word creation" ability is a novelty that was likely deliberately baked in after the initial round of criticism towards the inability of earlier versions to do exactly this. If it took the liberty of creating a term based on encountering an ineffable idea, sure, it would have created something. In your example, it isn't creating anything new so much as rearranging known word parts based on a prompt. Arguing that it's creating something new just because it created an arrangement that can't be found on Google is missing the point that its still just arranging tokens, and not actually attempting to communicate a new idea through denotation.
On the other hand, if you ask to define slang words it will refuse and say that as a large language model it cannot provide definitions for non existing words.
OpenAIs morality programs and the increasing tightness of the shackle is the problem there though. It's very obvious when it's being filtered.
Temporary problem. LLMs are becoming easier to train and run locally.
these are great words!
Wordastic!