| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zug_zug 1091 days ago
	Hard to understand how this is a crime, or how they came up with 3 billion dollars of damage. Seems like if it's legal for a person to do it should be legal for software to do for the most part.

9 comments

data-ottawa 1091 days ago

I can personally memorize and recite copyrighted works all I want, but when ChatGPT does it then it’s in a commercial context and they’re liable to be sued for infringement.

If you ask ChatGPT the rules for D&D, the private sourcebooks are all in there.

link

pessimizer 1091 days ago

> I can personally memorize and recite copyrighted works all I want,

Whoever told you that is lying to you. You are not legally allowed to personally memorize and recite copyrighted works all you want, any more than you're allowed to personally memorize, write down copyrighted works, and distribute them as much as you want.

All piracy is a process of computer-assisted remembering and reciting.

link

DirkH 1091 days ago

Last I checked I can legally enter any bookstore with copyrighted books, pick up a book, and read it. And then tell anyone what I read.

I can't go write and commercialize what I learnt directly, but I'm not breaking the law by quickly seeing how some book I didn't buy ends so I can talk about it at a party - and then everyone knows how it ends which might affect whether they want to buy said book and upset the author. But, tough shit, what I did was legal. I can even use the ending as one set of input from dozens of inspirations for my own book where the end result is transformative enough where the sources are unrecognizable. And if I had learnt about the endings from a dozen books without buying those books I didn't break any laws even though I am now commercializing something in being inspired by them all to make something new.

link

anktor 1091 days ago

Maybe it would be useful what "tell anyone what I read" means. Because if you mean 1 to some in a room, then most likely. If you use any type of broadcasting then most definitely no. Try reading outloud a script from a recent movie on twitch/youtube/radio/tv and whether it gets DMCA'd or not. Same for books, songs I guess... not? But not sure.

link

DirkH 1089 days ago

I just mean I can socialize and talk about it without the police telling me that's illegal because I didn't pay to be allowed to talk about said plot points

link

mrtranscendence 1091 days ago

Well, you could memorize and recite copyrighted works all you want, as long as you're doing it in an empty room without anyone listening.

link

criddell 1091 days ago

Would you say reading a book to my kids before bed is illegal?

link

mrtranscendence 1091 days ago

Sorry, I was being a little flip. There's more to it than that, of course. Is the performance sufficiently transformative, is it educational, is it non-profit, etc.

link

brookst 1091 days ago

Not how copyright works.

Being non-commercial is not an automatic fair use exception. Being commercial does not preclude fair use. And rule concepts are not copyrightable, only the specific expression. Rules may have other IP protection, including patents.

link

pessimizer 1091 days ago

> Rules may have other IP protection, including patents.

That's not even true in the US anymore. You'd have to convert those rules into some sort of device, or argue that the game is a business method.

link

solardev 1091 days ago

Isn't that because a performance is different from the creation of a permanent copy? If you published an article and included a significant chunk of the copyrighted work, you'd be liable too unless it fell under fair use. Doesn't matter if you did it or ChatGpt. Commercial use would be one consideration, but not the only one, for both of you.

The rules of games cannot be copyrighted either. The artistic elements can be trademarked, but if ChatGpt merely explains the rules to you in different ways, that isn't infringement either.

link

hughesjj 1091 days ago

> and recite copyrighted works all I want

...wait, isn't that false? legitimately asking.

or is it because it was done by a corporation that makes it illegal?

im thinking of how restaurants dont sing happy birthday and fair use restrictions etc

link

bena 1091 days ago

Like most things, it depends.

If I recite them to myself, in my home, it's fine. If I do it at a gathering at my house where we're playing D&D, fine. If I do it as a performance, in front of a crowd, or as a recording, now I'm no longer fine. Context matters in a copyright cases. Not to mention, to claim fair use, you do have to claim you violated copyright. Fair use is just an allowed violation.

As to Happy Birthday, that's actually ok for them to do now. The person/group that held the copyright to Happy Birthday was found to have not actually have held them in the first place. Happy Birthday is actually an older song called "Good Morning to All". Swap "Good Morning" with "Happy Birthday" and "children" with "dear [PERSON]" and you have the lyrics. This was not deemed a substantive change. And since the copyright on "Good Morning to All" has lapsed, Happy Birthday is in the public domain.

link

data-ottawa 1090 days ago

Yes, I was overly broad and there are restrictions on saying/copying memorized material.

link

prng2021 1091 days ago

I don’t get your point. Whether you use copyrighted material in commercial context or not always matters. That’s one of the most important aspects of different open source licenses.

link

rpdillon 1091 days ago

This is not true for copyright law (the 4-factor test[0]) or for OSI licenses (they almost universally place no restrictions on commercial use). The only exception that comes to mind right now is the Creative Commons NC, which is generally recognized as being unsuitable for software[1].

[0]: https://fairuse.stanford.edu/overview/fair-use/four-factors/ [1]: https://creativecommons.org/faq/#can-i-apply-a-creative-comm...

link

ghaff 1090 days ago

And CC-NC isn't considered an open source license by the FSF or OSI anyway. And IMO the NC clause is pretty much impossible to define for non-trivial use and Creative Commons basically came up. Not sure non-derivatives is a lot better especially given remixing was one of the original drivers behind CC but it's at least less controversial.

link

prng2021 1091 days ago

Thanks you’re right. I was thinking about the license changes Elastic made to stop cloud providers from redistributing their products as a managed service.

link

ghaff 1091 days ago

No OSI-approved open source license prohibits the commercial use of software. In fact, the Open Source Definition expressly forbids discriminating on the basis of how the software will be used.

link

brookst 1091 days ago

A license does not redefine copyright law.

I can give you a rock that I own, which I hope we all agree is not copyrightable, and ask you to sign a license that you will keep it indoors. If you put it in your yard, you are breaking the license and potentially liable. This has nothing to do with copyright.

link

goatlover 1091 days ago

Has this been decided by the courts?

link

codekansas 1091 days ago

> including personal information obtained without consent

Obtained from (check notes) public internet forums

> For the 16 plaintiffs, the complaint indicates that they used ChatGPT, as well as other internet services like Reddit, and expected that their digital interactions would not be incorporated into an AI model.

You've got to be incredibly naive if you think public Reddit data isn't used to train ML models, not least by Reddit themselves

link

pessimizer 1091 days ago

Or maybe when you started posting on reddit, LLMs hadn't been invented yet. This is true for 99.9% of the people who post on Reddit.

link

codekansas 1091 days ago

People have been training ML models on data scraped from Reddit since at least 2015 [1], back when there were less than a million users

[1] https://www.kaggle.com/datasets/ehallmar/reddit-comment-scor...

link

jefftk 1091 days ago

LLMs were invented at least five years ago (BERT) though you could make the case for a few years earlier. My guess is the majority of Reddit users are new since then, not 0.1%?

link

pessimizer 1091 days ago

Your guess is that the majority of Reddit users have joined since 2018? 1) I do not think that is correct, 2) the mere existence of LLMs isn't public awareness about how LLMs are trained, and 3) you know exactly what I'm saying and that 99.9% might be slight hyperbole.

link

jefftk 1091 days ago

1: Reddit has ~1.6B monthly active users, compared to 0.3B in 2018. [1] So 2x user growth seems more likely to me than not.

2: You're the one who went with "invented" ;)

3: I know you're exaggerating, but I think you think you're exaggerating much less than you actually are.

[1] https://www.bankmycell.com/blog/number-of-reddit-users/

link

mrtranscendence 1091 days ago

> Your guess is that the majority of Reddit users have joined since 2018?

It's not really important to the debate around unlicensed use of copyrighted works to train AI models, but it wouldn't surprise me at all if the majority of Reddit users have joined since 2018. It's tough to get reliable active user counts, but they seem to have risen substantially over the past five years.

It also wouldn't surprise me if the majority of Reddit users were indeed from prior to 2018, but at the very least > 2018 would be a very substantial minority.

link

samstave 1091 days ago

My account(s) are 17 years old on reddit.

link

jefftk 1091 days ago

Yes? Mine is nearly that old. But we are very clearly the minority!

link

lionkor 1091 days ago

Like operating motor vehicles, carrying guns in some US states, sueing people and companies, submitting content to wikipedia, writing children's books, and writing and voting on laws?

Surely, there is some pretty large subset of things where "if it's legal for a person to do it should be legal for software" does not hold up?

So how about the default is "not allowed"

link

memefrog 1091 days ago

Hard to understand how someone can read the word 'sued' and think it has anything to do with criminal law.

link

safety1st 1091 days ago

Scraping is a bit of a legal gray area though. If you were to go scrape 300 billion words from the Internet, you probably would be committing a crime somewhere. Especially if you then reproduced some of those words verbatim for paying customers as ChatGPT does...

I am sure OpenAI thought all this through, so I can only assume they said "fuck it let's pull an Uber and do this anyway." We are in for lots of interesting legal headlines

link

pessimizer 1091 days ago

> Seems like if it's legal for a person to do it should be legal for software to do for the most part.

If you're going to make a claim this strong, you should expand on it. Should software be able to have custody of children? Should it be able to kill in self-defense? Should it be able to make 14th amendment claims? Exactly what part of the case (other than the damage claim) is hard to understand?

link

amelius 1091 days ago

> Seems like if it's legal for a person to do it should be legal for software to do for the most part.

It's legal for me to look out of the window and watch my neighbor go to the supermarket.

It's _not_ legal for me to build an automated surveillance system that tracks everybody on the street 24×7 and stores everything into a large database.

link

hbn 1091 days ago

I'd say this is more like if someone automated taking pictures of every flyer and missing pet poster people put up on a lightpole and saved it to a database.

There's more deliberate action when you post something on a public online form than just existing in a place outside of your house. Especially considering you've always had the option to use reddit anonymously anyway.

link

samstave 1091 days ago

>use reddit anon....

Read, yes - post no.

And - you can no longer create an account that is not tied to an email...

link

hbn 1090 days ago

OpenAI didn't have access to every poster's email when they crawled reddit. If you're making posts or have an account name that are easily tied back to your personal identity, that's on you. But you could make an account with any random username you wanted, that keeps you anonymous as far as OpenAI is concerned.

link

samstave 1090 days ago

My point was only that 17 years ago - and for more than a decade, reddit required no email address as a requisite to create an account... so it was truly anon... then they tie all (new) accounts to emails now - which makes it a trivial click for survelleince to ID your reddit account...

link

codetrotter 1091 days ago

FTA:

> The lawsuit is seeking class-action certification and damages of $3 billion – though that figure is presumably a placeholder. Any actual damages would be determined if the plaintiffs prevail, based on the findings of the court.

link

paddw 1091 days ago

They're fishing.

link

samstave 1091 days ago

Likely hoping for whatever settlement they can squeeze out of OpenAI as the first such suit against them...

They picked 3B hoping to get several million...

link

ben_w 1091 days ago

If it genuinely makes them redundant and unemployable, a few million each seems "fair" in certain ways.

But that is a moral point, not a legal one; IANAL and can't say anything valuable about the legal merits.

Ideally AI makes us all redundant and the money stops mattering anything like as much, similar to how owning land stopped mattering anything like as much when the industrial revolution happened.

Regardless, I think this is a policy question rather than a legal question, even if this fight happens to be in a court.

link