Hacker News new | ask | show | jobs
by AequitasOmnibus 895 days ago
Copyright was never conceived to apply to technology like this and the onslaught of copyright suits (like the NYT one) underscore its fundamental rent-seeking nature. No doubt these latest changes to GPT-4 are in response to the suits they’re presently fighting. However these cases are ultimately resolved, the end-user will be the biggest loser.
9 comments

People generate all of the data going into the system and then the middle-men (OpenAI, Microsoft, Google, Big Tech middle-man of the week) reap a disproportinate centralized benefit. That causes a bigger problem than the so-called rent-seeking behavior of copyright holders in this case, as this has the net effect of leveraging human creativity, etc. to devalue it and continue the erosion of the middle class.

Bad things happen when you let middlemen get the upper hand, like the American health care system, or big finance disconnected from the real economy. I'll vote against the middleman every time in favor of the original value creator, because society goes down the toliet when middlemen win.

What is the alternative though? I agree with the feelings and sentiments of the anti-ai people that want it to pay copyright, but I never hear any considerations for what comes next.

This is going to end up being the music industry all over again. It's going to be impossible for any individuals or small companies to get the rights needed, and instead were going to get massive content labels selling the rights, or only giant corporations being able to hop through all these new hoops.

We don't want a repeat of that as a society, creating yet another leeching middleman and horrible industry favoring only the incumbents.

I don’t see it ending like that. LLMs will just be taught not to emit copyrighted content verbatim. Whatever the courts end up deciding, they’ll be trained to stay just this side of legal. I’m certain it’s already being worked on.
Yes, when I read "rent-seeking" i assume OP meant OpenAI.

Google search at least was just a link to content we wrote. OpenAI just steals it.

OP was obviously referring to the copyright holders whose data he feels so entitled to.
Open models are a thing. Rather than attacking the technology (which is great) with litigation to hurt a few bad actors, we should attack the capitalist rules that enables rent seeking middle man parasites to flourish.
Yes. If you think about it, the individual is being subjected to a man in the middle attack, cleaving a creator from their creation via the use of consent agreements for providing a platform. Rent seeking.
The artist or author might end up being the loser, and the multi billion corporation harvesting their work might make an unearned profit off it.

To me personally it's crazy how many people think that we would be better off without any kind of copyright protection. Copyright solves many real world problems and protects people against having a company profit off their work... but as soon as AI is involved so many people start to advocate for throwing it away.

If companies are required to purchase licenses for everything they train on, it will guarantee that only huge corporations with deep pockets can produce powerful models. Microsoft will be slightly inconvenienced, Stability AI will be destroyed. Some artists might get a payday, but most of the money will go to companies with large copyright libraries like Getty. The general quality of all models will decrease. I don't see any other possible outcome.
Almost a year ago, I made¹ the following prediction:

It looks like to me that many companies want to use the new generative tools, and many others want it not to impact their stake in the copyright system. I’m pretty sure they will both come to a compromise which will leave most users without any benefits, either from reduced copyrights or from availability of generative tools. It’s what would make both powerful parties satisfied (if not happy), and will impact the status quo the least.

Say, for instance, that they instituted a mostly mandatory licensing scheme, so that an individual artist had no choice but to allow use of their art as input when creating generative tools. People using art in this way have to pay a rather high licensing fee, but it is not paid to the artist, but to some sort of central copyright office. Huge copyright holders can also pay an exorbitantly high fee (to the same recipient) to opt out of licensing. Win-win-win; Existing copyright holders keep their existing copyrights, only large-ish actors can create new generative tools, new political positions and institutions are created with lots of money flowing in. Of course, artists then get screwed by being co-opted by generative tools which they can never afford to create themselves, and the general public get robbed both of the opportunity of using and creating new generative tools, and of any less restrictive copyright law.

1. <https://news.ycombinator.com/item?id=35191112>

For music there are already similar mechanisms in place in many countries - in Poland it's ZAiKS, in US it's ASCAP. They collect fees from organisations playing copyrighted music publicly.

(I agree that it would be terrible if they began enforcing other copyrighted content and for training purposes, because it would lead to centralisation)

Sacem in France.

They're the worst, eg they will notoriously come after you if you play public domain music as well.

I hope you’re wrong, but I think you’re right.
In agreement with your "slightly inconvenienced": The world's dozen or so largest publishers have market caps averaging below $10bn range each.

"Even" just OpenAI alone could pocket a few of them if they need easy sources of acquiring content.

This includes the largest educational publishers. And while these publishers do not own all their content, the reality is most authors earn so little, that a "allow AI training on my work for $x extra" would give them vast amounts of content.

As for Getty, Getty has a market cap of "only" $2bn. The big players will easily afford to build or buy libraries like that.

But of course it will be the end of decent open models.

> it will guarantee that only huge corporations with deep pockets can produce powerful models

It will also guarantee that the financial means to continue making that data, that is clearly so important, would be preserved. Someone has to pay for the crafting of the data.

For many artists this is not about "getting a payday" and is instead about "not being replaced by AI". So the outcome you describe would probably sound great to those artists.
How did dock workers feel wen containerized shipping starting gaining popularity? Should we have let them all continue putting things on ships piece b piece and stacking and unstacking each shipment by hand?

How did portrait artists feel when photography was gaining popularity? Should we have let them control the industry so that if we want to record a memory of a person we must have them stand or sit for hours while someone draws them?

etc.

Man there's always someone in these discussions who will smugly tell us that this is all inevitable and our empathy for the creatives in our economy is misplaced. To you I give a hearty fuck you.
No, I am describing what happens when technology makes the market for certain jobs and talents change. The stevedores may have had a bad time for a while but our modern society only exists because we can ship things quickly and efficiently.

I feel bad for copy editors and people who write corporate blog posts or design logos or come up with ad jingles, but their niche is gone now and they need to adapt.

Thanks for being respectful and cordial though.

The good artists are already using AI, just like they photobashed, traced templates and used camera obscuras to produce better art faster down through the ages. A true artist transcends medium to focus on message.
AI is a tool. different artists use different tools. some good artists use ai. many good artists will not be interested in that particular tool.
I don't think most people believe we are better off without copyright. I think people believe that copyright protects specific concrete expressions and that fair use exists to allow others to build on ideas in transformative ways. It's not clear where building a learning model from this work sits in this context, hence the court cases.

Also, it's a subtle difference, but copyright is not intended to solve the problem of companies profiting off of artist's works, it is intended to promote the progress of science and useful arts. It attempts to do this by giving creators limited exclusive rights.

How does locking away most of the knowledge, research and learning materials in the private vaults of a few publishing houses for their personal profit promote the progress of science I wonder?

Even scientists are tired of the predatory and rent seeking behaviour of the publishers they have fallen prey to and are looking for any way out.

This is not promoting progress this is the opposite of it

I think it grossly mischaracterizes what copyright protects to describe is as "most of the knowledge, research and learning materials". Still I agree, that the extensions of copyright length and the behavior/incentives of publishers works against the original intent of copyright. Having said that, publishers only have control of copyright because authors give it to them. Copyright rests with the creator — the system where people are compelled to sign this over to publishers is a different (but of course related) problem. Scientists who are tired of the predatory behavior of publishers have other choices today. It's not clear what alternative you are proposing.
> vaults of a few publishing houses for their personal profit

Because they made it, it wouldn't exist without them, and others value it. If this data wasn't objectively valuable, we wouldn't be having this discussion.

> but as soon as AI is involved so many people start to advocate for throwing it away

No, it's been years I've heard it.

Don't try to portray some people opinion as they are some AI zealot.

It's brought up on discussions about torrent, Disney, streaming platforms, music, etc...

Yes. I've been aware of the intellectual property debate at least back to the great crackdown on sampling around when Paul's Boutique was released. And following it in depth from around the time Lawrence Lessig made arguments to the Supreme Court.

A large chunk of the tech community was following that case and most on HN seemed to be highly sceptical of the current status quo.

How does it protect a small artist against a large corporation profiting off their work?

I don’t even have the means to start litigation, let alone see it through.

It only protects those who are already moneyed and/or famous enough to negatively impact a large corporation’s reputation - and even in those cases it’s mostly for the benefit of the lawyers and bureaucrats who make a living off it.

If you register your work, which requires some effort, but is not prohibitively expensive or difficult, you can sue for statutory damages, which are substantial enough (up to $150k for willful infringement) that lawyers will work on contingency. There are many individual artists how have been successful here. The law actually has some real teeth that individuals can use to protect their work.
It would be nice if there was a preventative concept, where the role of the creator being a predator, seeking and suing, would be mostly reversed, so that others would instead ask for permission, and maybe get the rights to copies through a fair exchange of money, like a license. We could call this "copy rights".
> The artist or author might end up being the loser, and the multi billion corporation harvesting their work might make an unearned profit off it.

Exactly like before AI you mean then? Except instead of OpenAI it was Disney, Universal and other large corporations on that same seat.

>to me personally it's crazy how many people think that we would be better off without any kind of copyright protection.

Why should I care that the old billionaire copyright corps are dying exactly? What would I benefit defending them for me as what they did was privatizing culture as far as I remember for their personal benefit and even had a large negative influence on tech.

The copyright system being so unequal and skewed towards multi billion companies dug its own grave by itself.

The copyright issue seems unchanged. Anyone taking wholesale quotes from another entity is likely in violation of copyright law. If someone uses AI, and posts the output from it as their own work, and that work contains copyrighted material, the person who posted it is in violation of copyright. AI is just a tool they chose to use and they remain responsible for remaining in compliance with copyright law.

What we need is a reasonable way for people using AI to determine which parts of the text or images they have are subject to copyright.

Just a tool that required billions of dollars worth of copyrighted material to be created.
How can you possibly argue that taking a bunch of text and creating an application that creates text isn't transformative?

The tool itself unambiguously is fair use.

Whether something is transformative is one of 4 tests for fair use.
> Anyone taking wholesale quotes from another entity is likely in violation of copyright law

What do you mean anyone?

Is Sony liable when you play an entire movie on their TV? Is Nuance liable when you use their Dragon screen reader to cerbalize an entire NYT article? Is Google liable when you display an entire webpage in Google Chrome? How about if you switch to Dark Mode, is that a transformative use?

Why would AI be any different? It’s just a tool at the end of the day!

The problem is people at large companies creating these AI models, wanting the freedom to copy artists’ works when using it, but these large companies also want to keep copyright protection intact, for their regular business activities. They want to eat the cake and have it too. And they are arguing for essentially eliminating copyright for their specific purpose and convenience, when copyright has virtually never been loosened for the public’s convenience, even when the exceptions the public asks for are often minor and laudable. If these companies were to argue that copyright should be eliminated because of this new technology, I might not object. But now that they come and ask… no, they pretend to already have, a copyright exception for their specific use, I will happily turn around and use their own copyright maximalist arguments against them.

(Copied from a comment of mine written over a year ago: <https://news.ycombinator.com/item?id=33582047>)

> these large companies also want to keep copyright protection intact, for their regular business activities

Care to share an example? I didn't hear of OpenAI or anyone else arguing or trying to sue anyone for abusing the copyright. If anything, their business decisions rely on an assumption that copyright will not help them protect their work

Prime example for you right here:

https://nypost.com/2023/12/18/business/openai-suspends-byted...

100% pure unadulterated hypocrisy from "OpenAI".

T&C yes, but not copyright. This is fully consistent with them opposing copyright and not opposing paywalls/api limitations.
Don't they have an explicit T&C that says you are not allowed to use their output for training other models?
T&C yes, but not copyright.
I was mostly thinking of large companies also creating their own AI, like Google, Microsoft, etc.
If their model was leaked, you can be sure they’d claim copyright protection on it.
I wanted to say that they are to smart to expect dmca to protect them.

But then, I think that surely they would use copyright to block competition from using their model directly.

Because ChatGPT users are the only people that are worth considering.
OpenAI is the force to cut slice from the copyright pie which the big copyright hoarders have. The hoarders will not strike back to try to kill the OpenAI business. Because in any case they will not be able to kill the technology itself. So, obviously, it's better for them to have OpenAI as a partner and share some profit with them to control the AI field than to kill this one and wait for another AI menace to raise.

OpenAI is not the one who would kill a copyright. They just want their cut.

Why should 'big tech' corporations be allowed to use AI to remix/mash-up human-generated content all of a sudden when creative individuals have generally been prohibited from doing it for so long?
Wow, I didn't know creative individuals have been banned from remixing copywrited material in their own private works.

We must tell the millions of kids who doodle characters in their notebooks that this prohibited.

The data is not the technology.
I'm sure you'd feel the same way if it was your life's work these systems were hoovering up and regurgitating.