Hacker News new | ask | show | jobs
by Chatting 991 days ago
"The fair use of a copyrighted work [...] for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work."

-- 17 U.S. Code § 107 (https://www.law.cornell.edu/uscode/text/17/107)

I don't know how one can read this as an impartial observer and make an honest argument that OpenAI is in the right.

Their use of copyrighted material does not fit any of the purposes enumerated in the first paragraph; it fails criteria #1 because it is of a commercial nature; it fails criteria #2 because it includes all kinds of works; it fails criteria #3 because it's not limited to very small extracts; and it fails at criteria #4 because their products are already having an obvious effect on the market.

6 comments

Incidentally this will create a reaction that companies will make their content to be unavailable on public sources to prevent they working for free to feed AI freeloaders.

The opposite view is also valid. SEO-types will figure out on how to deploy their BS into models so that they will recommend their stuff.

As AI models tend to replace the “search” market, they will become as useless as today search tech.

I don't think it fails (3) specifically, and the others are moot if it doesn't fail (3). 99.9999% percent of the time, it just straight up does not reproduce any concrete part of the piece.

It just reproduces some very very hard to quantify tiny fraction of the logic or idea of the document.

It's like saying someone is infringing copyright if it's able to recite a document they read, because it has had an unimaginably tiny effect on their general writing skills, and they're able to recite it if they're asked to.

but it also fails on the very philosophical framework used to come up with those laws

they're using philosophical frameworks older than digital computers; which is ok, and is as should be. up to the point where those ways to understand fail to capture certain qualities of computers and digital technology which break the whole notion of copyright and have been doing it for a few decades now.

those lawmakers are using obsolete philosophy!

but we gotta wait until they all die off, is not like people, specially older people, are willing or possibly able to change how they think

I blame digital technology, it just doesn't work like the rest of reality does...

using obsolete philosophy has never stopped a supreme court judge from upholding shit written by the founding fathers
but they're using the 'same' obsolete philosophy as the founding father used when writing those things

so it's correct that they do this, else they would change the meaning behind those words.

what is needed is that congress approves new philosophy so they can use it to make better (meaning up to date) laws which are appropriate for this technological epoch

and what is needed for that is that the academic community comes up with new philosophy

and what is needed for that I do not know

Your post would be more substantial if it were informed by the arguments against the points you raise by the linked article. In general, looking only at the text of a law is insufficient as a basis for legal analysis.
and after that court case, I’ll go after other accepted fair use works under a 14th amendment challenge saying it needs to apply equally to other use cases simply because ads are being used, or a university charges for a book, or a publication charges for subscription access

which I’m fine with, dont think I’m trying to deter you with a slippery slope fallacy, we’re already at the bottom of it

The use is using it for training data. I'm pretty sure there isn't much of an effect on the market for training data.
And what does the training data then do? Just sit in a box, or produce content for "direct human consumption" that would have market impacts?

If all anyone has to do to invalidate copyright law is introduce an intermediary step where it is used for a non-consumption purpose, then copyright is already dead.