| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by NicuCalcea 935 days ago
	I think it may change the discussion about copyright a bit. I've seen many arguments that while GPTs are trained on copyrighted material, they don't parrot it back verbatim and their output is highly transformative. This shows pretty clearly that the models do retain and return large chunks of texts exactly how they read them.

1 comments

bonzaidrinkingb 935 days ago

I suspect ChatGPT is using a form of clean-room design to keep copyrighted material out of the training set of deployed models.

One model is trained on copyrighted works in a jurisdiction where this is allowed and outputs "transformative" summaries of book chapters. This serves as training data for the deployed model.

link

LeifCarrotson 935 days ago

The article describes how the deployed model can regurgitate chunks of copyrighted works - one of the samples literally ends in a copyright notice.

link

bonzaidrinkingb 935 days ago

If these were copyrighted works, how did these end up in the public comparison dataset?

Sure, some copyrighted works ended up in the Pile by accident. You can download these directly, without the elaborate "poem" trick.

link

a1o 935 days ago

That sounds like copyright washing if there is such thing.

link

jnwatson 935 days ago

If that's copyright washing so are Cliff's Notes.

link

xp84 935 days ago

Yup, though a lot of people are acting now as though every already-established principle of fair use needs to be revised suddenly by adding a bunch of "...but if this is done by any form of AI, then it's copyright infringement."

A cover band who plays Beatles songs = great An artist who paints you a picture in the style of so-and-so = great

An AI who is trained on Beatles songs and can write new ones = exploitative, stealing, etc. An AI who paints you a picture in the style of so-and-so = get the pitchforks, Big Tech wants to kill art!

link

blitzar 935 days ago

> A cover band who plays Beatles songs

Has to pay the Beatles for the pleasure of doing so.

link

whstl 934 days ago

This discussion about art "in the style of" being stealing or exploitative hasn't started with AI. For quite some time there has been complaints of advertisements commissioning sound-alike tunes to avoid paying licensing. AI is only automating it and making it possible in an industrial scale.

link

lewhoo 935 days ago

Well, I don't know about that. I strongly suspect chatgpt could deliver whole copyrighted books piece by piece. I suspect that because it most certainly can do that with non-copyrighted text. Just ask it to give you something out of the Bible or Moby Dick. Cliff Notes can't do that.

link

whatshisface 935 days ago

Why would you suspect that?

link