Hacker News new | ask | show | jobs
by oraphalous 517 days ago
I don't even understand why it's everyone elses problem to opt-out.

Eventually - for how many of these AI companies would a person have to track down their opt-out processes just to protect their work from AI? That's crazy.

OpenAI should be contacting every single one and asking for permission - like everyone has to in order to use a person's work. How they are getting away with this is beyond me.

4 comments

Copyright doesn't prevent anyone from "using" a person's work. You can use copyrighted material all day long without a license or penalty. In particular, anyone is allowed to learn from copyrighted material by reading, hearing, or seeing it.

Copyright is intended to prevent everyone from copying a person's work. That's a very different thing.

There is an argument to be made that ChatGPT mildly rewording/misquoting info directly from my blog is copying.
And it is. And you can sue them for that. What you can’t do is get upset they (or their AI) read it.
Sure, but that's a different claim and a different argument.
I think to make that argument you would need evidence that someone prompted ChatGPT to reword/misquote info directly from your blog, at which point the argument would be that that person is rewording/misquoting info directly from your blog, not ChatGPT.
I don't think so: The user is merely making a request for copyrighted material, which is not itself infringing, even if their request was extremely specific and their intent was obvious.

OpenAI would be the company actually committing the infringement and providing the copy in order to satisfy the request.

If the law suddenly worked the other way around, companies would no longer be able to prosecute people for hosting pirated content online, because the responsibility would lie with the users choosing to initiate the download.

That would fall under fair use.

Legally, you'd struggle to prove any form of infringement happened. Making a copy is fine. Distributing copies is what infringes. You'd need to prove that is happening.

That's why there aren't a lot of court cases from pissed off copyright holders with deep pockets demanding compensation.

> Copyright doesn't prevent anyone from "using" a person's work.

It should. The 'free and open internet' is finished because nobody is going to want to subject their IP to rampant laundering that makes someone else rich.

Tragedy of the commons.

I can see this both ways. For the sake of argument, please explain why using IP to train an AI is evil, but using the same IP to train a human is good.

Note that humans use someone else's IP to get rich all the time. E.g. Doctors reading medical textbooks.

>Note that humans use someone else's IP to get rich all the time. E.g. Doctors reading medical textbooks.

You need a better example, a textbook was created with this exact purpose of sharing knowledge with the reader.

My second point, if you write a poem and I read it and memorize it, then publish it as my own with some slight changes you would be upset?

If I get your painting, then use a script to apply a small filter to it then sell it as my own, is this legal? is my script "creative"?

This AIs are not really creative, they just mix inputs and then interpolate an answer , is some cases you can't guess what input image/text was used but in other cases it was shown ezactly the source that was used and just copy pasted in the answer.

> My second point, if you write a poem and I read it and memorize it, then publish it as my own with some slight changes you would be upset?

I feel the problem with analogizing to humans while trying to make a point against unlicensed machine learning is that applying the same moral/legal rules as we do to humans to generative models (learning is not infringement, output is only infringement if it's a substantially similar copy of a protected work, and infringement may still be covered by fair use) would be a very favorable outcome for machine learning.

> they just mix inputs and then interpolate an answer , is some cases you can't guess what input image/text was used

Even if you actually interpolated some set of inputs (which is not how diffusion models or transformers work), without substantial similarity to a protected work you're in the clear.

> is my script "creative"? [...] This AIs are not really creative [...]

There's no requirement for creativity - even traditional algorithms can make modifications such that the result lacks substantial similarity and thus is not copyright infringement, or is covered by fair use due to being transformative.

>I feel the problem with analogizing to humans while trying to make a point against unlicensed machine learning is that applying the same moral/legal rules as we do to humans to generative models (learning is not infringement, output is only infringement if it's a substantially similar copy of a protected work, and infringement may still be covered by fair use) would be a very favorable outcome for machine learning.

Agree. copyright is clear, so if I can make ChatGPT output copyrighted material then Open AI should pay me correct? Or you will claim that this is rare, a mistake and we should forgive OpenAI while a human would have had to pay damages.

Is the AI allowed to decide unprompted how to spend the money? Can it decide that it doesn't like the people who made it and donate it to charity. Can the AI start it's own company and not hire anyone that made it? Can the AI decide that it prefers the open Internet and will answer all questions for free?

The sake of argument is a cowards way of expressing an unpopular opinion in public. Join a debate club if you're actually being genuine.

I never used the word evil.

That said, machines don't have natural rights, and you don't get to use them to violate mine.

scale
Under this mentality, every search engine index would be shut down.
cool
Napster had a moment too, but then they got steamrolled in court.

Courts are slow, so it seems like nothing is happening, but there’s tons of cases in the pipeline.

The media industry has forced many tech firms to bend the knee, OpenAI will follow suit. Nobody rips off Disney IP and lives to tell the tale.

If your business model depends on the Roberts' court kneecapping AI, pivot. Training does not constitute "copying" under copyright law because it involves the creation of intermediate, non-expressive data abstractions that do not reproduce or communicate the copyrighted work's original expression. This process aligns with fair use principles, as it is transformative, serves a distinct purpose (machine learning innovation), and does not usurp the market for the original work.
I believe there are some other issues other than just "is it transformative".

I can't take an Andy Warhol painting, modify it in some way and then claim it's my own original work. I have some obligation to say "Yeah, I used a Warhol painting as the basis for it".

Similarly, I can't take a sample of a Taylor Swift song and use it myself in my own music - I have to give Taylor credit, and probably some portion of the revenue too.

There's also still the issue that some LLMs and (I believe) image generation AI models have regurgitated works from their training models - in whole or part.

>I can't take an Andy Warhol painting, modify it in some way and then claim it's my own original work. I have some obligation to say "Yeah, I used a Warhol painting as the basis for it".

If you dont replicate Warhols painting entirely, then you are fine. Its original work.

The number of Scifi novels I read that are just an older concept reimagined with more modern characters is huge.

>I can't take an Andy Warhol painting, modify it in some way and then claim it's my own original work. I have some obligation to say "Yeah, I used a Warhol painting as the basis for it".

In most sane jurisdictions you can sample other work. Consider collage. It is usually a fair use exemption outside of the USA. If LLMs cause keyboard warriors to develop some seppocentric mindvirus leading to the destruction of collage I will be pissed.

>There's also still the issue that some LLMs and (I believe) image generation AI models have regurgitated works from their training models - in whole or part.

Considered a high priority bug and stamped out. Usually its in part because a feature is common to all of an artists work, like their signature.

> I can't take an Andy Warhol painting, modify it in some way and then claim it's my own original work.

This is a hilarious choice of artist given that Warhol is FAMOUS for appropriating work of others without payment, modifying it in some way, and then turning around and selling it for tons of money. That was the entire basis of a lot of his artistic practice. There was even a Supreme Court case about it.

There was a time when it did not usurp the market for the original work, but as the technology improves and becomes more accessible, that seems to be changing.
In my experience when existing laws allow an outcome that causes enough significant harm to groups with influence, the laws gets changed.
> Training does not constitute "copying" under copyright law

It should.

And yet Micky Mouse is in the public domain. Something those of us who remember the 90s thought would never happen.
Just the oldest Mickey. They gave up on it because the cost/benefit wasn't deemed worth it anymore.
I don't even understand why it's everyone elses problem to opt-out.

Because the work being done, from the point of view of people who believe they are on the verge of creating AGI, is arguably more important than copyright.

Less controversially: if the courts determine that training an ML model is not fair use, then anyone who respects copyright law will end up with an uncompetitive model. As will anyone operating in a country where the laws force them to do so. So don't expect the large players to walk away without putting up a massive fight.

Of note here is the reason it's "important" is it will make a shit-ton of money.
That, coupled with the obvious ideological motivations. Success could alter the course of human history, maybe even for the better.

If you feel that what you're doing is that important, you're not going to let copyright law get in the way, and it would be silly to expect you to.

I can't say I believe that. If that were the case, they'd focus more on results and less on hyping up the next underwhelming generation.
For one thing, they are focused on money because they need lots of it to do what they're doing.

For another, the o1-pro (and presumably o3) models are not "underwhelming" except to those who haven't tried them, or those who have an axe to grind. Serious progress is being made at an impressive pace... but again, it isn't coming for free.

Oh please. OpenAI and I guess every other AI company are for-profit.

The only change they are motivated by is their bank balances. If this were a less useful tool they’d still be motivated to ignore laws and exploit others.

Hard to say what motivates them, from the outside looking in. There have been signs of cultlike behavior before, such as the way the rank and file instantly lined up behind Altman when he was fired. You don't see that at Boeing or Microsoft.

Obviously it's a highly-commercial endeavor, which is why they are trying so hard to back away from the whole non-profit concept. But that's largely orthogonal to the question of whether they feel they are doing things for the benefit of humanity that are profound enough to justify blowing off copyright law.

Especially given that only HN'ers are 100% certain that training a model is infringement. In the real world, this is not a settled question. Why worry about obeying laws that don't even exist yet?

> Hard to say what motivates them, from the outside looking in.

It isn't.

> There have been signs of cultlike behavior before, such as the way the rank and file instantly lined up behind Altman when he was fired.

This only reinforces that the real drive is money.

>Especially given that only HN'ers are 100% certain that training a model is infringement. In the real world, this is not a settled question. Why worry about obeying laws that don't even exist yet?

This is exactly why people are against it.

Your argument is that there is no definitive law. Therefore the creators of the data you scrape to train, and their wishes are irrelevant.

If the motivation was to help humanity, they’d think twice about stepping on the toes of the humanity they want to save and we’d hear more about nontrivial uses.

> OpenAI should be contacting every single one and asking for permission - like everyone has to in order to use a person's work

This is the problem of thinking that everyone “has” to do something.

I assure you that I (and you) can use someone else’s work without asking for permission.

Will there be consequences? Perhaps.

Is the risk of the consequences enough to get me to ask for permission? Perhaps.

Am I a nice enough guy to feel like I should do the right thing and ask for permission? Perhaps.

Is everyone like me? No.

> How they are getting away with this is beyond me.

Is it really beyond you?

I think it’s pretty clear.

They’re powerful enough that the political will to hold them accountable is nonexistent.