Hacker News new | ask | show | jobs
by JoshTriplett 108 days ago
> But including your art in the training data is fair use

The four factors of fair use in the US:

> the purpose and character of your use

Commercial, for-profit. Not scholarship, not research, not commentary, not parody, etc.

> the nature of the copyrighted work

Absolutely everything. Artistic, creative, not purely factual.

> the amount and substantiality of the portion taken, and

All of it, from everyone.

> the effect of the use upon the potential market.

Directly competing with those whose data was copied.

3 comments

3 and 4 are what that argument is based on, I believe. 3) on the basis that the output is not _reproduced_, and 4) on similar grounds that output that's just not at all the same as the input data isn't affecting the market for the original image (I think this is the more debatable one, but in general the existing cases have struggled at the early stages because the plaintiffs have not been able to actually point to output that is a copy of their part of the input, and this does actually matter).
>Directly competing with those whose data was copied.

An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.

>All of it, from everyone.

With the result that anything produced by the LLM does not reproduce any single source in its entirety (and where compelled if they are able to do that is a bug not a feature)

Fair use is too specific tbh, rather than ruling it fair use (which seems to be where things are going) it should just be ruled "use". There's nothing wrong with building a mathematical model using available data.

> An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.

Yes, it does. Many people are using AI-generated works in places where they originally would have either paid an artist, programmer, or other creative professional, or done without. Many companies are claiming to reduce staff because of AI (whether that's true or an excuse). There is plenty of evidence that AI is directly competing with various individuals, businesses, and industries.

> With the result that anything produced by the LLM does not reproduce any single source in its entirety

You do not have to reproduce sources in their entirety to produce derivative works.

> the amount and substantiality of the portion taken, and

> All of it, from everyone.

Yea I'd like to see how drawing two circles violates the copyright of drawing one circle!