Hacker News new | ask | show | jobs
by SomeCallMeTim 816 days ago
> If ChatGPT verbatim reproduces

Copyright covers "derivative works." Verbatim is absolutely not a requirement for infringement.

If you take a copyrighted image and modify it, even to the point where it's unrecognizable, if the image is being used in the same way (i.e., isn't a "transformative use"), then it's still a derivative work.

Yes, you are likely to get away with it if you're not caught. But that doesn't mean what you're doing is considered fair use, just that you won't get sued.

Thing is, every piece of text generated by ChatGPT is incrementally using every character of training data. So legally speaking, everything it produces is arguably a derivative work of ALL of the training data.

Generative AI isn't even a legal gray area; under current law, there's no blanket exception for "how much" of a copyrighted work is used. At best there's a fair use _guideline_ that lists, as one of four criteria, the amount and nature of the copyrighted work used. But really it's the entirety of millions of copyrighted works being used to generate the models, and those works _can_ be reproduced verbatim in many cases, proving that the works are encoded into the model.

Generative AI is only permitted because there's big money behind it along with associated lobbyists. And there are many in-flight lawsuits trying to shut down both GPT and various art-generating AIs.

Maybe they'll change the law. Maybe courts will side with the AI companies. But until then, it seems obvious to me that anyone arguing that generative AI based on models built with copyrighted works is completely legal is using motivated reasoning.

1 comments

I understand OpenAI is a US company, but this is a US-centric view. This is especially since TFA is about a Brazillian operation.

> under current law, there's no blanket exception for "how much" of a copyrighted work is used

Under fair dealing laws, there are. [1] Though, as always, if commercial fan art is legal, then so should something that uses only a couple bytes of information per work, bar overfits.

> But until then, it seems obvious to me that anyone arguing that generative AI based on models built with copyrighted works is completely legal is using motivated reasoning.

It is completely legal in the EU, Japan, South Korea and Singapore. [2]

[1] https://libhelp.ncl.ac.uk/faq/43267

[2] https://www.reedsmith.com/en/perspectives/ai-in-entertainmen...

Your link re: Fair Dealing guidelines does NOT make it 100% legal. For one, the ENTIRE works are encoded into the model--not a part of them. For another, those are just guidelines, not explicit exceptions, just like Fair Use in the US. It's all very hand-wavy, even more so in the UK, apparently, so there's no way you can list those guidelines and say that anything is clearly allowed.

Your second link means it's legal for them to CREATE THE MODEL. This is true in the US as well: The model is a clearly transformative use of the data.

But as soon as the model produces works in the same use category as the original work (code -> model -> code, for instance, or image -> model -> image), it is no longer transformative.

If you understand the law and the technology, it's clearly generating derivative works.

Entire works are encoded in the model in the same way that if I cut up a document into individual words and put it in a bag with a bunch of other documents, if I was a no life loser I could spend a long time "recreating" the document from individual words. The bag of cutout words is NOT copyright violation though.