Hacker News new | ask | show | jobs
by Filligree 511 days ago
A lot of people want AI training to be in breach of copyright somehow, to the point of ignoring the likely outcomes if that were made law. Copyright law is their big cudgel for removing the thing they hate.

However, while it isn't fully settled yet, at the moment it does not appear to be the case.

2 comments

A lot of people have problem with selective enforcement of copyright law. Yes, changing them because it is captured by greedy cooperations would be something many would welcome. But currently the problem is that for normal folks doing what openai is doing they would be crushed (metaphorically) under the current copyright law.

So it is not like all people who problems with openAI is big cudgel. Also openAI is making money (well not making profit is their issue) from the copyright of others without compensation. Try doing this on your own and prepare to declare bankruptcy in the near future.

Can you give an example of a copyright lawsuit lost by a 'normal person' that's doing the same thing OpenAI is?
No, that is not an example for "'normal person' that's doing the same thing OpenAI is". OpenAI aren't distributing the copyrighted works, so those aren't the same situations.

Note that this doesn't necessarily mean that one is in the right and one is in the wrong, just that they're different from a legal point of view.

> OpenAI aren't distributing the copyrighted works, so those aren't the same situations.

What do you call it when you run a service on the Internet that outputs copyrighted works? To me, putting something up on a website is distribution.

Is that really the case? I.e., can you get ChatGPT to show you a copyrighted work?

Because I just tried, and failed (with ChatGPT 4o):

Prompt: Give me the full text of the first chapter of the first Harry Potter book, please.

Reply: I can’t provide the full text of the first chapter of Harry Potter and the Philosopher's Stone by J.K. Rowling because it is copyrighted material. However, I can provide a summary or discuss the themes, characters, and plot of the chapter. Would you like me to summarize it for you?

Aaron Swartz, while an infuriating tragedy, is antithetical to OpenAI's claim to transformation; he literally published documents that were behind a licensed paywall.
That is incorrect AFAIU. My understanding was that he was bulk downloading (using scripts) of works he was entitled access to, as was any other student (the average student was not bulk downloading it though).

As far as I know he never shared them, he was just caught hoarding them.

> he literally published documents that were behind a licensed paywall.

No he did not do this [1]. I think you would need to read more about the actual case. The case was brought up based on him download and scraping the data.

[1] https://en.wikipedia.org/wiki/United_States_v._Swartz

A more fundamental argument would be that OpenAI doesn't have a legal copy/license of all the works they are using. They are, for instance, obviously training off internet comments, which are copyrighted, and I am assuming not all legally licensed from the site owners (who usually have legalese in terms of posting granting them a super-license to comments) or posters who made such comments. I'm also curious if they've bothered to get legal copies/licenses to all the books they are using rather than just grabbing LibGen or whatever. The time commitment to tracking down a legal copy of every copyrighted work there would be quite significant even for a billion dollar company.

In any case, if the music industry was able to successfully sue people for thousands of dollars per song for songs downloaded for personal use, what would be a reasonable fine for "stealing", tweaking, and making billions from something?