| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rasz 925 days ago
	Thats because the word 'training' is doing all the heavy lifting here. Think of it as copying, compressing and storing all the copyrighted material in a database. Humans learn, humans train, computers encode data. You would never say ffmpeg learned a movie.

3 comments

chii 925 days ago

> You would never say ffmpeg learned a movie.

no you wouldn't, but these diffusion models do way more than ffmpeg, and do qualitatively different things.

I am on the fence, but i lean towards the side where training an AI using existing works is not infringement, as long as the AI's output is (or can be) majority new works. For example, a poor training algorithm that merely repeats the training dataset (and cannot output new works) is infringing, while a different algorithm (such as the current stable diffusion one) that can output works that has never been made and is totally new, does not infringe - after all, style and ideas are not infringing and if the algorithm managed to extract those ideas from the training set, all the better.

link

TimPC 925 days ago

Majority new works is not a good enough standard. If any output is a direct reproduction of a copyrighted input that output is copyright infringement whether it was intended or not. If the trainer of the model doesn’t want to be sued for infringement they are responsible for a robust safety mechanism that prevents it. If that safety mechanism isn’t possible than don’t use copyrighted works if you have any possibility of directly reproducing them.

link

chii 925 days ago

> If any output is a direct reproduction of a copyrighted input that output is copyright infringement

so by that standard, why isnt photoshop a copyright infringement? You can use it to create a copy just the same.

link

TimPC 925 days ago

Photoshop isn’t a copyright infringement inherently but producing an infringed image with photoshop is still infringement. Much the same way AI is not inherently infringement but any production of infringing content by the AI is still infringement.

link

ddol 925 days ago

What’s the test for “has never been made and is totally new”?

If I look at a photo of Prince and then using that image as reference create a new silkscreen painting is that fair use or infringement?

Because the US Supreme Court has ruled that instance I referenced was infringement as both images were used for magazine covers [0].

[0] https://www.nbcnews.com/news/amp/rcna64624

link

chii 925 days ago

> What’s the test for “has never been made and is totally new”?

the existing copyright rulings are sufficient to determine this, and has nothing to do with ai models.

You've already pointed out a case - if you use an AI to generate an image which has sufficient likeness to an existing one, then the AI portion is irrelevant to the ruling. You could've made that same image in photoshop without AI, and should obtain the same ruling.

But in the above circumstance, the silkscreen used in the creation of the image does not itself infringe. And replace that silkscreen with AI model, nothing has changed.

link

davely 925 days ago

> Think of it as copying, compressing and storing all the copyrighted material in a database.

But it isn’t. It’s just a series of vectors that point to a likely occurrence of the next word or pixel or bit in a sequence.

link

rasz 925 days ago

You are trying to argue encoding semantics, but at the end of a day the "AI" was completely happy to recite Carmack's Fast inverse square root including original comments verbatim word for word.

https://twitter.com/StefanKarpinski/status/14109710611816816...

link

davely 925 days ago

With the way these AI models work, that data isn’t stored in a database though.

It’s hard for people to understand this concept, but the fact that a model repeated some data verbatim is a happy coincidence (!) solely based on patterns of data that it seen before.

I think people have also have a hard time with how these models are trained. They are vacuuming up all sorts of data and learning from them by creating vectors that determine how follow-up data should be generated.

Sure, the original creators of this content aren’t being compensated or even recognized for it. I don’t have a good idea on how that should be handled.

For normal humans though, looking at art or reading a book, and later repeating some passage or drawing something from your own memory is not a crime. (Unless you’re sharing the DeCSS source code I guess…)

Slightly changing the topic here, but I do wonder what were to happen if someone wrote a program called “Monkeys on Typewriters” that just iterated through various combinations of characters (or bits or pixels) and was able to recreate things verbatim.

Is that random happenstance copyright infringement?

link

dragonwriter 925 days ago

> For normal humans though, looking at art or reading a book, and later repeating some passage or drawing something from your own memory is not a crime.

False, actually; memorizing a copyrighted work and reproducing it other than in conditions specifically excepted from copyright protection is a violation of the exclusive rights of the copyright holder to make copies.

link

surfacedetail 925 days ago

Reciting common text or common license elements and commentary isn't necessarily copyright infringement.

link

gpm 925 days ago

You would never say ffmpeg stole a movie either...

link