Hacker News new | ask | show | jobs
by zug_zug 1091 days ago
Hard to understand how this is a crime, or how they came up with 3 billion dollars of damage.

Seems like if it's legal for a person to do it should be legal for software to do for the most part.

9 comments

I can personally memorize and recite copyrighted works all I want, but when ChatGPT does it then it’s in a commercial context and they’re liable to be sued for infringement.

If you ask ChatGPT the rules for D&D, the private sourcebooks are all in there.

> I can personally memorize and recite copyrighted works all I want,

Whoever told you that is lying to you. You are not legally allowed to personally memorize and recite copyrighted works all you want, any more than you're allowed to personally memorize, write down copyrighted works, and distribute them as much as you want.

All piracy is a process of computer-assisted remembering and reciting.

Last I checked I can legally enter any bookstore with copyrighted books, pick up a book, and read it. And then tell anyone what I read.

I can't go write and commercialize what I learnt directly, but I'm not breaking the law by quickly seeing how some book I didn't buy ends so I can talk about it at a party - and then everyone knows how it ends which might affect whether they want to buy said book and upset the author. But, tough shit, what I did was legal. I can even use the ending as one set of input from dozens of inspirations for my own book where the end result is transformative enough where the sources are unrecognizable. And if I had learnt about the endings from a dozen books without buying those books I didn't break any laws even though I am now commercializing something in being inspired by them all to make something new.

Maybe it would be useful what "tell anyone what I read" means. Because if you mean 1 to some in a room, then most likely. If you use any type of broadcasting then most definitely no. Try reading outloud a script from a recent movie on twitch/youtube/radio/tv and whether it gets DMCA'd or not. Same for books, songs I guess... not? But not sure.
I just mean I can socialize and talk about it without the police telling me that's illegal because I didn't pay to be allowed to talk about said plot points
Well, you could memorize and recite copyrighted works all you want, as long as you're doing it in an empty room without anyone listening.
Would you say reading a book to my kids before bed is illegal?
Sorry, I was being a little flip. There's more to it than that, of course. Is the performance sufficiently transformative, is it educational, is it non-profit, etc.
Not how copyright works.

Being non-commercial is not an automatic fair use exception. Being commercial does not preclude fair use. And rule concepts are not copyrightable, only the specific expression. Rules may have other IP protection, including patents.

> Rules may have other IP protection, including patents.

That's not even true in the US anymore. You'd have to convert those rules into some sort of device, or argue that the game is a business method.

Isn't that because a performance is different from the creation of a permanent copy? If you published an article and included a significant chunk of the copyrighted work, you'd be liable too unless it fell under fair use. Doesn't matter if you did it or ChatGpt. Commercial use would be one consideration, but not the only one, for both of you.

The rules of games cannot be copyrighted either. The artistic elements can be trademarked, but if ChatGpt merely explains the rules to you in different ways, that isn't infringement either.

> and recite copyrighted works all I want

...wait, isn't that false? legitimately asking.

or is it because it was done by a corporation that makes it illegal?

im thinking of how restaurants dont sing happy birthday and fair use restrictions etc

Like most things, it depends.

If I recite them to myself, in my home, it's fine. If I do it at a gathering at my house where we're playing D&D, fine. If I do it as a performance, in front of a crowd, or as a recording, now I'm no longer fine. Context matters in a copyright cases. Not to mention, to claim fair use, you do have to claim you violated copyright. Fair use is just an allowed violation.

As to Happy Birthday, that's actually ok for them to do now. The person/group that held the copyright to Happy Birthday was found to have not actually have held them in the first place. Happy Birthday is actually an older song called "Good Morning to All". Swap "Good Morning" with "Happy Birthday" and "children" with "dear [PERSON]" and you have the lyrics. This was not deemed a substantive change. And since the copyright on "Good Morning to All" has lapsed, Happy Birthday is in the public domain.

Yes, I was overly broad and there are restrictions on saying/copying memorized material.
I don’t get your point. Whether you use copyrighted material in commercial context or not always matters. That’s one of the most important aspects of different open source licenses.
This is not true for copyright law (the 4-factor test[0]) or for OSI licenses (they almost universally place no restrictions on commercial use). The only exception that comes to mind right now is the Creative Commons NC, which is generally recognized as being unsuitable for software[1].

[0]: https://fairuse.stanford.edu/overview/fair-use/four-factors/ [1]: https://creativecommons.org/faq/#can-i-apply-a-creative-comm...

And CC-NC isn't considered an open source license by the FSF or OSI anyway. And IMO the NC clause is pretty much impossible to define for non-trivial use and Creative Commons basically came up. Not sure non-derivatives is a lot better especially given remixing was one of the original drivers behind CC but it's at least less controversial.
Thanks you’re right. I was thinking about the license changes Elastic made to stop cloud providers from redistributing their products as a managed service.
No OSI-approved open source license prohibits the commercial use of software. In fact, the Open Source Definition expressly forbids discriminating on the basis of how the software will be used.
A license does not redefine copyright law.

I can give you a rock that I own, which I hope we all agree is not copyrightable, and ask you to sign a license that you will keep it indoors. If you put it in your yard, you are breaking the license and potentially liable. This has nothing to do with copyright.

Has this been decided by the courts?
> including personal information obtained without consent

Obtained from (check notes) public internet forums

> For the 16 plaintiffs, the complaint indicates that they used ChatGPT, as well as other internet services like Reddit, and expected that their digital interactions would not be incorporated into an AI model.

You've got to be incredibly naive if you think public Reddit data isn't used to train ML models, not least by Reddit themselves

Or maybe when you started posting on reddit, LLMs hadn't been invented yet. This is true for 99.9% of the people who post on Reddit.
People have been training ML models on data scraped from Reddit since at least 2015 [1], back when there were less than a million users

[1] https://www.kaggle.com/datasets/ehallmar/reddit-comment-scor...

LLMs were invented at least five years ago (BERT) though you could make the case for a few years earlier. My guess is the majority of Reddit users are new since then, not 0.1%?
Your guess is that the majority of Reddit users have joined since 2018? 1) I do not think that is correct, 2) the mere existence of LLMs isn't public awareness about how LLMs are trained, and 3) you know exactly what I'm saying and that 99.9% might be slight hyperbole.
1: Reddit has ~1.6B monthly active users, compared to 0.3B in 2018. [1] So 2x user growth seems more likely to me than not.

2: You're the one who went with "invented" ;)

3: I know you're exaggerating, but I think you think you're exaggerating much less than you actually are.

[1] https://www.bankmycell.com/blog/number-of-reddit-users/

> Your guess is that the majority of Reddit users have joined since 2018?

It's not really important to the debate around unlicensed use of copyrighted works to train AI models, but it wouldn't surprise me at all if the majority of Reddit users have joined since 2018. It's tough to get reliable active user counts, but they seem to have risen substantially over the past five years.

It also wouldn't surprise me if the majority of Reddit users were indeed from prior to 2018, but at the very least > 2018 would be a very substantial minority.

My account(s) are 17 years old on reddit.
Yes? Mine is nearly that old. But we are very clearly the minority!
Like operating motor vehicles, carrying guns in some US states, sueing people and companies, submitting content to wikipedia, writing children's books, and writing and voting on laws?

Surely, there is some pretty large subset of things where "if it's legal for a person to do it should be legal for software" does not hold up?

So how about the default is "not allowed"

Hard to understand how someone can read the word 'sued' and think it has anything to do with criminal law.
Scraping is a bit of a legal gray area though. If you were to go scrape 300 billion words from the Internet, you probably would be committing a crime somewhere. Especially if you then reproduced some of those words verbatim for paying customers as ChatGPT does...

I am sure OpenAI thought all this through, so I can only assume they said "fuck it let's pull an Uber and do this anyway." We are in for lots of interesting legal headlines

> Seems like if it's legal for a person to do it should be legal for software to do for the most part.

If you're going to make a claim this strong, you should expand on it. Should software be able to have custody of children? Should it be able to kill in self-defense? Should it be able to make 14th amendment claims? Exactly what part of the case (other than the damage claim) is hard to understand?

> Seems like if it's legal for a person to do it should be legal for software to do for the most part.

It's legal for me to look out of the window and watch my neighbor go to the supermarket.

It's _not_ legal for me to build an automated surveillance system that tracks everybody on the street 24×7 and stores everything into a large database.

I'd say this is more like if someone automated taking pictures of every flyer and missing pet poster people put up on a lightpole and saved it to a database.

There's more deliberate action when you post something on a public online form than just existing in a place outside of your house. Especially considering you've always had the option to use reddit anonymously anyway.

>use reddit anon....

Read, yes - post no.

And - you can no longer create an account that is not tied to an email...

OpenAI didn't have access to every poster's email when they crawled reddit. If you're making posts or have an account name that are easily tied back to your personal identity, that's on you. But you could make an account with any random username you wanted, that keeps you anonymous as far as OpenAI is concerned.
My point was only that 17 years ago - and for more than a decade, reddit required no email address as a requisite to create an account... so it was truly anon... then they tie all (new) accounts to emails now - which makes it a trivial click for survelleince to ID your reddit account...
FTA:

> The lawsuit is seeking class-action certification and damages of $3 billion – though that figure is presumably a placeholder. Any actual damages would be determined if the plaintiffs prevail, based on the findings of the court.

They're fishing.
Likely hoping for whatever settlement they can squeeze out of OpenAI as the first such suit against them...

They picked 3B hoping to get several million...

If it genuinely makes them redundant and unemployable, a few million each seems "fair" in certain ways.

But that is a moral point, not a legal one; IANAL and can't say anything valuable about the legal merits.

Ideally AI makes us all redundant and the money stops mattering anything like as much, similar to how owning land stopped mattering anything like as much when the industrial revolution happened.

Regardless, I think this is a policy question rather than a legal question, even if this fight happens to be in a court.