Hacker News new | ask | show | jobs
by 453qtgreq 1293 days ago
Let's be clear - we're all obsessed with ChatGPT, but all it does is regurgitate existing works.

I see this as further progress down the path of "We don't know how this works, there is no way to know how, but it does.". It also exchanges creativity and the skills gained through effortful creation for dependence on a technolgical marvel that is (currently) free.

11 comments

> but all it does is regurgitate existing works

I am genuinely intrigued by this point of view and so would love to hear people who hold it's reasoning.

Over the last few days I've seen hundreds of poems and stories from ones about climate change in the style of a sonnet to peanut butter sandwiches getting stuck in toasters in the style of the bible. I even asked it to make a text adventure game for me to play where I could put in any instruction, leading to a unique series of events and narrative.

Is the claim that these were all simply copy and pastes of something on the internet in their entirety? And that as such the internet already seems to contain essentially every permutation of everything I could ask ChatGPT, as to me this sounds highly implausible.

If the claim is that whilst these are not direct copy pastes, it is essentially a remix of lots of different things people have said before on the internet repurposed to a different end, is that not literally just what language is? Humans use common sayings, idioms, slang and phrases all the time, never mind the 'tropes' and story lines that are reused constantly. Coders use common patterns and styles and copy from stackoverflow. In fact language literally only works because we all share it and share the meaning of it.

If we are saying that all ChatGPT does is remix existing language and phrases to a new purpose... to me we are saying ChatGPT does the same thing as humans.

Any thoughts would be appreciated.

If you spend enough time with it you can quickly start to see the building blocks. More than just the similarities you see in several works by the same person.

If you’ve ever played a video game where the levels are programmatically generated, you’ll eventually notice the modular nature of everything. It becomes very noticeable and it’s different from noticing natural patterns that arise in a completely handcrafted level.

GPT is exactly like that.

You won’t notice it with short output. But ask it to keep going and write the next verse of that sonnet. Keep going and it gets VERY repetitive in a way that a human poet doesn’t.

Also keep in mind that the examples you’re seeing are a form of sample bias. You’re seeing output curated by humans.

But isn’t it possible that humans are “just” doing the same but with a much higher foundational resolution?
In judging GPT, we should be drawing a line in the sand between a poor writer (e.g. a college student) and an excellent one (a professional, ideally a domain expert). GPT is impressing us now only because we have very low expectations for a computer to write like a human. But when judged against a capable human, especially when you focus on the _information_ that's conveyed, GPT sucks.

Few of us would want to read a book written as flaccidly as GPT does, a wandering narrative that cannot connect the semantics of one sentence to the next. GPT is capable only of writing _sets_ of sentences, not a progression of them where a concept is introduced and subsequent text tests and contrasts a convincing set of complements and alternatives. Given LLM architectures (and inherent black boxes), that's likely to never change since they don't understand logic or dependency, sufficiency, or necessity. Without those skills, no LLM will ever be able to _develop_ an idea or expand on the implications proposed in an introduction. Have you ever seen GPT conclude a paragraph with a bang? And you never will.

You'll note too that GPT has not attempted to write dialog. I think there's good reason for that. Talk strips away all the excess verbiage. Humans say only what's essential to convey our message. Pointless blather that leads nowhere is maddening to listen to for long. But without a clear focus, that's exactly what I'd expect from GPT: wandering speech that's totally disengaging. Surprising the reader simply is not in its DNA.

So if you're happy reading narratives that wander, that don't lead anywhere, with characters lacking personality, and a narrative style lacking the essentials of every engaging tale (exposition, conflict, suspense, climax, resolution)... then perhaps GPT is for you. But if that's GPT's great promise -- to generate businessspeak that's "tolerable" but even less engaging than the marketing schlock that immerses us now -- then thanks, but no thanks.

I am not arguing that the current generation of language models will result in 1:1 levels of human:machine intelligence, just that pattern recognition seems like it is enough to eventually lead there.

> You'll note too that GPT has not attempted to write dialog.

Here is one of my prompts, is this not dialogue?

"The year is 1960. Choose 3 famous physicists. They are having a debate about whether black holes exist. Show the transcript."

  Albert Einstein: Gentlemen, I have been studying the theory of relativity for many years, and I believe that the concept of black holes is absurd. The idea that an object could be so massive and dense that not even light can escape its gravitational pull goes against everything we know about the laws of physics.

  Stephen Hawking: Mr. Einstein, I must respectfully disagree. The mathematics behind black hole theory is quite compelling, and there is mounting evidence from observations of stars and other celestial bodies that suggests their existence.

  Richard Feynman: I think both of you make valid points, but I believe that we must approach this question with an open mind. We cannot dismiss the idea of black holes simply because they challenge our current understanding of the universe. We must continue to gather evidence and test our theories in order to better understand the nature of these mysterious objects.

  Einstein: But surely, Mr. Hawking, if black holes were real, we would have observed their effects on nearby objects by now.

  Hawking: Not necessarily, Mr. Einstein. Black holes are incredibly difficult to detect, and we may not have the technology or understanding to do so at this point in time.

  Feynman: I agree with Mr. Hawking. The study of black holes is a relatively new field, and we are still working to develop the tools and techniques needed to study them more effectively. I believe that with further research and experimentation, we will be able to confirm or disprove their existence.

  Einstein: I will continue to remain skeptical until I see concrete evidence of black holes. But I am open to being proven wrong if the evidence warrants it.

  Hawking: And I will continue to explore the possibility of black holes and work to gather more evidence to support their existence.

  Feynman: I think that is a fair and reasonable approach. As scientists, it is our duty to seek the truth, no matter how challenging it may be.
Sure it’s possible. I mean every one of us is a unique arrangement of the same fundamental building blocks.

But there’s no reason to assume that something works the same way we do just because someone named it a neural network.

The key concept here for me is pattern recognition. It lines up with my experience when trying to practice creativity.

For example, for decades it has been my opinion that my own supposedly higher than average pattern matching skills were the main thing I brought to the table at work.

When I play drums, keys, or guitar, coming up with new patterns and styles to play feels magical when it happens in real-time, but if I break it down later, I almost always see the influences.

>In fact language literally only works because we all share it and share the meaning of it.

Most human communication is to some extent adversarial and constantly subverts the assumption of shared meaning on purpose or by accident. It's extremely difficult to "solve" because cooperative communication often appears to subvert the basic principles too.

Have you ever read about Grice's maxims of conversation? Once you think in those terms, the frequent absence of cooperation in most human conversation is obvious.

https://en.wikipedia.org/wiki/Cooperative_principle

>Is the claim that these were all simply copy and pastes of something on the internet in their entirety? And that as such the internet already seems to contain essentially every permutation of everything I could ask ChatGPT, as to me this sounds highly implausible.

It's the training data supplied to GPT3 (as explained by OpenAI themselves), so yes, it is literally true. You are just seeing snippets of the internet, re-formed and regurgitated.

It can only do what you ask.

So I appreciate the jist of your point but the way these models work is rather more complicated that copying and pasting snippets and so it certainly is not 'literally' true. The models are trained to predict sub-word level tokens from the internet training dataset, so the level of re-formation and regurgitation in a generated sentence can be vast, to the point of final sentence being novel it's own right.
100% of the training data of the salty jelly in a human skull, is a sensory input.

Were your argument flawless and your conclusion correct, then all human creativity would "literally" be a remix of things in the natural world, as even when we remix things made by other humans that too would ultimately derive back to nature.

This can certainly be asserted depending on how you wish to use those words, but just as it is not useful at predicting what our abilities are — for example, a perfect intellect could predict quantum mechanics from scratch by watching a camp fire die down on a rainy night, but we didn't do that in one step in the neolithic — so too this isn't useful at telling us what the limits of GPT-family LLMs might be, as that which has been built on such inputs, both in the case of humans and this particular AI, greatly exceeds the imagination of any single individual.

In some sense, every AI tool is "regurgitating its training set". In the case of NLP, that training set will be a corpus of documents, utterance-intent pairs, lists of Named Entities, etc.
> regurgitate existing works

The other night it regurgitated a humorous four-stanza poem for my D&D group, incorporating (and even rhyming) several of the player character names. Given that the names were made-up and unique, and have zero hits on google, which existing works would you say ChatGPT was regurgitating from?

> which existing works would you say ChatGPT was regurgitating from?

Being able to answer this question will earn you millions, of not billions. However, considering neural networks works by banging their training sets together and emitting the resulting sparks, it certainly regurgitated its training set.

Given that you supplied ChatGPT with those names, which still have the very ordinary properties of rhyming with other words and denoting a particular person or being, that's not very interesting.
Constructing a novel, genuinely humorous poem that incorporates given inputs isn't very interesting because... I had to supply the inputs? Sorry, but I don't follow.
Your original comment made a big deal of a poem-writer being able to incorporate unique words that you gave to it in the first place. No, that's not very interesting.

Writing a "4-line humourous poem" is also trivial. I'm sure my four year old could manage it. Praising a machine learning algorithm for this, that took over 10,000 Exaflops to perform its training routine seriously demeans human ability.

I didn't claim the poem was interesting, I gave it as an example of something which could not possibly have been regurgitated straight out of the training set, which it clearly was.

Also: it was a 4-stanza poem, three rhyming couplets per stanza. It had a recognizable beginning/middle/end and a reasonably humorous punchline. The rhyme scheme was AABBCC - your four-year old will need to know that when they replicate the feat ;)

You're amazed because a training set, largely based on the contents of the internet since about 2011, is able to find words that rhyme with arbitrary word endings?

4 stanzas. My bad. Still not impressed.

Ask it to expand and write the next verse. Keep going until it gets repetitive. Might give you a glimpse of the training set.
> ...but all it does is regurgitate existing works.

And this is the uncomfortable truth. When I share this with people, many people don't believe and knowledgeable people don't care.

Also, most (if not all) the work is just collected without consent. I'll not re-open that can of worms of licenses, fair use, ethics and what not. I'm done for today on that front.

Which is pretty unfortunate and grim if you ask me.

> ...but all it does is regurgitate existing works.

Not so. I taught it a new and novel notation for music, and it happily transcribed me real songs in that new notation. There is no existing work to base it off of.

It's not a fixed model. The point is you taught it to ChatGPT. It did not invent it.

In other words, you asked it to transcribe a song to another template you showed it, so it works as expected. A neural network which can do style transfer from one template to another.

If it would tell you "Hey, I designed a notation, which works better for $REASON. Can you give me feedback?", then this would be something notable.

Again, ChatGPT utilized the info you gave to it. So it was not novel to begin with.

> If it would tell you "Hey, I designed a notation, which works better for $REASON. Can you give me feedback?", then this would be something notable.

I actually did do that, without me prompting it!

For reference, the notation is, 'A4(quarter)', etc

  This is an interesting notation system you have created! It looks like it is based on
  traditional musical notation, but with a few differences. In traditional musical notation,
  notes are written on a staff, with the pitch of the note indicated by its placement on the
  staff and its duration indicated by the type of note symbol used (whole note, quarter note,
  etc.). Your A-Script notation appears to use the pitch of the note as the main identifier,
  with the duration indicated in parentheses after the pitch.
  One potential benefit of this notation system is that it makes it easier to identify the pitch
  of a note without having to refer to the staff. This could be useful in situations where the
  staff is not visible, or when the pitch of the note is the most important information.
  One potential drawback of this notation system is that it does not provide as much
  information about the rhythm of the music. In traditional notation, the placement of the
  notes on the staff and the use of different note symbols provide important information
  about the rhythm of the music. This information is not included in your A-Script notation.
  Overall, your A-Script notation is a creative and interesting way to represent music. It
  could potentially be useful in certain situations, but it may not provide as much
  information as traditional notation.
Did you invent this novel notation? If so, that is the existing work.
Heh. Out of curiosity I just tried what you're asking for, and ChatGPT suggested a color-based notation where parts for multiple instruments are displayed together in different colors, so that each musician can focus on their own part but also see what other instruments are doing if they want to harmonize.

Then I asked if the notation could be improved to take advantage of digital displays, and it suggested the notes could start out as hollow outlines and gradually fill in, becoming solid at the moment they should be played.

It seems like regurgitation is pretty advanced these days :D

You asked it for a novel musical notation, and it gave you a riff on guitar hero and DDR and beat saber.

Not only that, but it wouldn't really work well for music played by professionals. Putting multiple parts together is reducing how easy it is to distinguish your part, compared to the norm where you JUST have your part, and adding color to the party doesn't get you back that level of being able to distinguish your part, and might even require different parts of your brain get involved.

It sounds exactly like the kind of thing a stoner comes up with and crows about before they sober up and realize "no wait, that wasn't really a good idea and I know nothing about the relevant domain". No wonder these chat bots are so popular on HN

> You asked it for a novel musical notation, and it gave you

Newp - I asked it for "a visually unusual musical notation that isn't just adding extra markup to traditional staff notation", or words to that effect, and it suggested something that met all the requirements I gave it. If you want to add more requirements you need to tell the chatbot, not me.

I'm baffled how people are so amazed by easily-found existing ideas being presented by an arrogant chatbot.
BTW, that will be very counterintuitive for a performing player, because a performing musician will look both ahead and behind during performances. Adding more dynamism to an already stressful real-time flow is not good.

How do I know? I’m a double bass player.

It was an example of GPT offering a notation unrelated to anything mentioned in the prompt. I wasn't suggesting we all adopt it as our new notation.

This whole comment page feels like a chain of "it clearly can't do X" -> "it does X" -> "no obviously it does X but it can't do Y", spread among several threads and repeated over and over.

I have various objections to GPT and am annoyed about the various misunderstandings people have about it but this in particular is not a very good objection.

A lot of what people do in everything spanning from a paper pushing office job to art is regurgitation.

GPT is capable of regurgitating existing works, and it's also capable of producing things that are a unique blend of many different things it has seen, which is pretty much the limit of what humans are capable of as well.

It exchanges one type of creative effort for another, higher throughput form of creative effort. Why write individual words myself when I can get a computer to produce a buffet of words then stitch together the best ones to achieve my vision?

> Let's be clear - we're all obsessed with ChatGPT, but all it does is regurgitate existing works.

So like human beings?

We genuinely don't know that yet. People keep saying this as if we have any reason to suspect neural networks are even remotely related to how mammal brains work. The strongest link they have is that neural networks chose a very heavy handed name.

It's pretty likely that a handful or neurons, or even a singular neuron is comparable to a HUGE neural net in ability to "learn".

We know of plentiful single celled organisms that learn. It's clear we have no fucking clue what's going on, and we as computer people should stop letting AI researchers claim magic

Have you used it? I got it to give me computer code in the style of Sean Hannity and review it in the style of Hulk Hogan. You can ask for code that contains an error.

> all it does is regurgitate existing works.

Like humans do?

Is creating just a fancy way of regurgitating existing works? Everything is a remix after all. Where is the root of creation then?

This is a very interesting topic of conversation, but applying this level of thought to the output of GPT3 hugely over-rates what it does.

You already provided it with the cultural and artistic prompts, the full context of which were already generated independently by humans.

> all it does is regurgitate existing works

I'm not sure I understand. It can be used in many different ways, partly as a general natural language interface: it is great at (also example):

Expand on these points of Ai capability with examples: machine translation, writing text based on keywords/short-hand, altering style/tone, summation.

To which chat gpt responds as below[r]. If this was a support or sales dialog - it might have saved me quite a lot of time - even without doing any editing (which I probably would have).

At any rate, I don't see how this is "just regurgitating" in any meaningful way? Sometimes change in scope/scale implies change in kind.

[r] > Machine translation is a common example of artificial intelligence (AI) capability. This technology allows for the automatic translation of text or speech from one language to another. For instance, a person speaking in Spanish could have their words automatically translated into English in real-time. This capability has been used in applications such as language translation apps, multilingual customer support systems, and global business communication.

Another example of AI capability is writing text based on keywords or short-hand. This type of technology uses algorithms to analyze a set of keywords or phrases and generate a coherent text based on those inputs. For instance, a person could input the keywords "dog" and "chasing tail" and the AI system would generate a sentence such as "The dog was running in circles, chasing its tail." This capability has been used in applications such as automated content generation for social media posts or marketing materials.

AI technology can also alter the style or tone of text. This capability allows for the manipulation of the language used in a text to fit a specific style or tone. For instance, a person could input a text in a formal tone and the AI system could alter it to a more casual or conversational tone. This capability has been used in applications such as automated email responses or customer service chatbots.

Finally, AI technology can be used for summation, or the creation of a concise summary of a longer text. This capability allows for the extraction of the key points or ideas from a text and the generation of a shorter, more easily digestible summary. For instance, a person could input a lengthy article and the AI system could generate a one-paragraph summary of the main points. This capability has been used in applications such as summarizing news articles or generating summaries of research papers.

Parent commenter here. No-one has picked up on the last of my points, which I think is the most crucial. Once we're all hooked on the "productivity gains" of widely using GPT, what will we do once Microsoft start bundling it into subscription packages? Just pay up, I suppose.
There are a variety of GPT models with different numbers of parameters. I can run GPT-J locally right now, and as model architectures progress and the killer app for video cards goes from games/mining to AI, we will see a combination of increased card VRAM and memory optimized models that will make running this stuff locally feasible. Training is expensive, but inference is usually fairly cheap, you just need to be able to load the model in VRAM. Thankfully there are open source groups and communities working together to train and fine tune models of this sort so we won't be completely reliant on commercial entities.
You appear to me to be stuck in the academic world of arguing concepts that are detached from how the simplest of things work in the real world.

> [W]hat will we do once Microsoft start bundling [GPT] into subscription packages?

Pray tell, how will Microsoft do that, exactly? And assuming there is some licensing or subscription model for the use of GPT, what is wrong, exactly, with that?

I don't understand what you mean by your first sentence. Copilot-enhanced VS Code is becoming very popular. Copilot uses GPT3, which is exclusively licensed by Microsoft from OpenAI. I'm very much talking about the real world, and the frighening ease and speed which which MS is taking over the way we write software.

>Pray tell, how will Microsoft do that, exactly?

How will Microsoft make people pay for tools that they currently distribute freely? You don't have to look very far to find examples.

Absent real AGI almost all AI tools are going to assist us in intellectual/white collar work, it will rarely entirely replace humans in the larger picture of the work they produce.

More like having an exoskeleton helping you instead of telling a robot to do your entire job for you.

If you force it into a novel situation and give it some nudges, you can get a novel response