Hacker News new | ask | show | jobs
by atleastoptimal 4 days ago
It's funny how worked up people get about copyright with respect to to AI training when using copyrighted material for training an AI model is fair use. We have a concept of fair use in copyright because economic growth is essentially tied to the free proliferation of information.

I can't really trust any anti-AI argument when it feels more of a tribal grievance than a rational explanation of concern. Especially with the overuse of the "techbro" pejorative, it seems more a lament against a certain type of attidude in the tech world and a hatred that that attitude has translated into massive material wealth.

3 comments

> We have a concept of fair use in copyright because economic growth is essentially tied to the free proliferation of information.

Who is "we"? The concept of "fair use" is pretty specific to the US. There is no "fair use" in Germany, for example.

Personally, I like the idea of "fair use", but I personally don't think that it applies to commercial closed-source AI model training. The US Copyright Office seems to agree with this assessement:

> Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs—all of which can affect the market. When a model is deployed for purposes such as analysis or research—the types of uses that are critical to international competitiveness—the outputs are unlikely to substitute for expressive works used in training. But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.

https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

> economic growth is essentially tied to the free proliferation of information.

Isn't this a good argument for models produced by proliferation of said information to also be open-weight then?

Or perhaps your line of thinking is more that AI companies shouldn't have legal recourse should the weights for a model get leaked because this too would clearly contribute to economic growth.

> It's funny how worked up people get about copyright with respect to to AI training when using copyrighted material for training an AI model is fair use

Personally I disagree with the finding that it is fair use. I think the fact that it was found to be fair use is a miscarriage of justice. Am I allowed to have dissenting opinions on that topic?

Yes of course, but why do you believe it's not fair use?

The essence of fair use is that it is ok if the use of the copyrighted material is transformative. LLM training I believe is transformative, as it is taking the data (as in a work of text, an image, video), and feeding it through the layers of a neural net to marginally update its weights. It is factoring each work into a very very small fraction of the model's overall sense of the world.

Now, AI models are capable of reproducing copyrighted works to a degree of high accuracy, especially if those works recur very frequently in the training data, but I believe that's a different issue. Any video camera is capable of taking a photo of a copyrighted work, but that isn't essential to the value of the camera, though it is undeniably what certain cameras are used for. The exact reproduction of copyrighted works is a likewise something that LLM's can do, as any intelligent person could recite song lyrics or a work if they memorized it, but each individual work is only marginal to the overall effectiveness and value proposition of the model.

> Now, AI models are capable of reproducing copyrighted works to a degree of high accuracy, especially if those works recur very frequently in the training data, but I believe that's a different issue

I disagree. I think it is the exact issue. I think that treating it as a different issue is legal stickhandling that goes against the spirit of what fair use is intended to be, to get the outcome that the rich and powerful want.

Extremely strict copyright laws were intended to give the rich and powerful the outcomes they want, allowing corporations to rich themselves on IP gatekeeping. Until the last couple years, it was firmly corporation and wealth aligned to be extremely litigious and offer little leeway for using copyrighted works.
And now that it is advantageous for the corporations and wealthy to ignore that, they are using it to hoover up public data to enrich themselves even further

This reversal isn't making it better to be a poor artist you know. It's further consolidating power and wealth upward by looting the public (and not so public) domain

Don't believe me? Disney is making deals with big image model providers to allow generation of their characters, meanwhile poorer online artists will get nothing

That's just one example

> The essence of fair use is that it is ok if the use of the copyrighted material is transformative.

This is a necessary, but not a sufficient condition. Fair use is much more complex than you think.

Let’s say I created a robot that walked around the world and gathered data from its environment. On its way it heard 20 copyrighted songs, looked at 40 copyrighted works. Should I owe royalties to the creators of those 60 works if I were to sell my robot?