Hacker News new | ask | show | jobs
by atleastoptimal 1 day ago
Yes of course, but why do you believe it's not fair use?

The essence of fair use is that it is ok if the use of the copyrighted material is transformative. LLM training I believe is transformative, as it is taking the data (as in a work of text, an image, video), and feeding it through the layers of a neural net to marginally update its weights. It is factoring each work into a very very small fraction of the model's overall sense of the world.

Now, AI models are capable of reproducing copyrighted works to a degree of high accuracy, especially if those works recur very frequently in the training data, but I believe that's a different issue. Any video camera is capable of taking a photo of a copyrighted work, but that isn't essential to the value of the camera, though it is undeniably what certain cameras are used for. The exact reproduction of copyrighted works is a likewise something that LLM's can do, as any intelligent person could recite song lyrics or a work if they memorized it, but each individual work is only marginal to the overall effectiveness and value proposition of the model.

2 comments

> The essence of fair use is that it is ok if the use of the copyrighted material is transformative.

This is a necessary, but not a sufficient condition. Fair use is much more complex than you think.

Let’s say I created a robot that walked around the world and gathered data from its environment. On its way it heard 20 copyrighted songs, looked at 40 copyrighted works. Should I owe royalties to the creators of those 60 works if I were to sell my robot?
> Now, AI models are capable of reproducing copyrighted works to a degree of high accuracy, especially if those works recur very frequently in the training data, but I believe that's a different issue

I disagree. I think it is the exact issue. I think that treating it as a different issue is legal stickhandling that goes against the spirit of what fair use is intended to be, to get the outcome that the rich and powerful want.

Extremely strict copyright laws were intended to give the rich and powerful the outcomes they want, allowing corporations to rich themselves on IP gatekeeping. Until the last couple years, it was firmly corporation and wealth aligned to be extremely litigious and offer little leeway for using copyrighted works.
And now that it is advantageous for the corporations and wealthy to ignore that, they are using it to hoover up public data to enrich themselves even further

This reversal isn't making it better to be a poor artist you know. It's further consolidating power and wealth upward by looting the public (and not so public) domain

Don't believe me? Disney is making deals with big image model providers to allow generation of their characters, meanwhile poorer online artists will get nothing

That's just one example