Hacker News new | ask | show | jobs
by Gigachad 109 days ago
Trying to protect a particular style is just unworkable for obvious reasons. The only solution I can think of is requiring AI companies to license all of the content they have in their training set so artists get paid for the training rather than trying to work out which source material links to which outputs which is impossible.
2 comments

When I buy a book, I don't buy a license to read it, I don't sign an EULA that says I won't scan it, digitize it, or write a program to analyze the word frequencies it contains. Do you want buy a license to read a book, because this is how you get there.
The law has always been able to recognize a distinction between Hunter S. Thompson reading Ernest Hemingway and learning from his style and a billion GPUs reading a billion books to be able to produce it on demand. It takes time for the law to catch up to the technology but it will.
You don't sign an EULA saying you can't do those things because scanning then distributing is already prohibited by copyright. The way to start a license war is to keep the status quo of these companies being able to ingest and essentially reproduce human work for free. One of my big worries about AI is that it will accelerate companies locking everything down and hoarding their own data.
I suspect it’s already has a dampening effect on individuals sharing. It leaves a bad taste in the mouth to know that anything you share intending to help fellow humans will immediately be ripped and profited from by companies that want to take your job and profit from it.
The old rules were built on based on old capabilities and and old reality which no longer exists.
This is a deceptive line of dismissal. Sound principles needs to be figured out before imposing any kind of restriction on art - "things have changed" doesn't cut it.
Figuring things out is exactly what needs to happen. I think it is valid to dismiss arguments of “this is how copyright has always worked” when those rules were written before AI completely changed the game.
In Spain books include a copyright notice explicitly prohibiting reproduction and digitalization and alluding to article 270 of the Spanish criminal code.
The book can say anything it wants, whenever it's true and/or applicable in court later on is a very different matter. Spain's SGAE is a very powerful lobby but still needs to follow the law.

Edit: haven't followed the law in a while, but you could definitely copy, digitalize and scan documents for yourself and your friends (copia privada).

In Spain EULAs cannot infringe upon the law either.
Perhaps it's that the transaction for you, an individual not explicitly profiting from the work, should be treated differently than a corporation using a work solely to profit from it.
Of course you don’t, because it’s not the EULA that enforces the copyright. Copyright law is what enforces the EULA. It’s right there in the fact it’s a Licensing Agreement.
The problem isn’t the reading. The problem is the output based on somebody’s other work.

There is a reason why we call it styles, because it’s a recognizable pattern someone came up with maybe after decades of work.

The "funny" thing is that we absolutely allow people to copy style... but somehow software isn't allowed to do that?

You don't even need to have a legally acquired source material to produce work in a certain style.

The new reality allows for original creators to actually track the chain, so we're in this situation.

If people would have the same mass output as software we wouldn’t allow that too.

If one or two people take an apple from your tree it’s not a big deal, if a machine takes 10,000 it is.

When I buy a patented product I don't sign an EULA that says I can't manufacture and sell a copy, but I still can't manufacture and sell a copy.
It is not an individual buying the book but a corporation, with the purpose of being able to create imitations of it, and all other books.
It's even worse than that. You then have to pay an additional fee to use its ideas as inspiration for your own book.
Copyright quite literally protects the act of copying or reproducing a work protected by copyright. And you are technically entering into something akin to an end user licensing agreement when you buy a book, the only difference being that the EULA is incorporated into law on an international basis through reciprocal copyright treaties.

So if scan a book you are making a copy. In some copyright jurisdictions this is allowed for individuals under a private copying exception - a copyright opt out, if you like - but the important thing is private use. In some jurisdictions there is also a fair use exception, which allows you to exploit the rights protected by copyright under certain circumstances, but fair use is quite specific and one big issue with fair use is that the rights you are exploiting cannot result in something that competes with the original work.

Other acts restricted by copyright include distribution, adaptation, performance, communication and rental.

So if you copy a book, digitize it, and write a program to analyze the word frequencies it contains you may, in some jurisdictions but not all, be allowed to do this.

If you’re doing it locally on your own machine you are simply copying it. If you do it in the cloud you are copying it and communicating the copy. If you copy it, analyze it and train an AI model on it that could be considered fair use in certain jurisdictions. Whether the outputs are adaptations of the training data is a matter of debate in the copyright community.

But importantly if you commercialise that model and the resulting outputs compete with the copyright protected material you used to train, your fair use argument may fail.

So when you buy a book you are actually party to what is effectively a licence granted by the copyright holder, albeit it to the publisher. But as the end user of the book you are still restricted in what you can do with that copyright protected work, through a universal end user licence encoded in law.

The cumulative license fees required to properly compensate all artists is so absurd that it will probably genuinely burn down the entirety of global economy if paid. The only solution I can think of is to burn down just the AI to be revisited later to be rebuilt as a tool that won't require absurd amount of training data, that also leave a lot more to its human operator beyond merely accepting literal categorical descriptions that are fundamentally tangential to artistic values of outputs.

And I think same could happen to LLM. If it took all the fossil fuel on Earth just to barely able to drive a car to a car wash, there's more things wrong with the car than in the oil price.

> is so absurd that it will probably genuinely burn down the entire global economy if paid.

Where did you get that idea. Global economy is ~200T/year PPP. 0.1% of that split across every artist you want the training data from would be insanely difficult for the vast majority of them to turn down. Which makes sense as art isn’t that big a percentage of the global economy compared to say housing, food, medical care, infrastructure, military spending etc.

Obviously the incentive to take without compensation is far more appealing, but that doesn’t mean it was impossible to make a reasonable offer.

For all the people represented in the training data to receive royalties would be an incredible wealth transfer to the Extremely Online. My forum posts, StackOverflow answers etc are also contributing to the model outputs. The training data, by volume, mostly belongs to blog authors, redditors, Wikipedia editors, to us!
The people in that counting to infinity subreddit would get compensated a lot if this were fully automated - their posts were so overrepresented in the training set that many of their usernames became complete tokens (e.g. SolidGoldMagikarp).
I object to calling people chatting online artists.

However, ultimately nobody is going to pay them more than the value of their posts to the AI company which puts a severe cap on what that’s actually worth. People who post a great deal of online content might be worth compensating a few thousand dollars, but it would be hard for them to then turn that down.

I think the lower bounds of someone signing away rights to their whole art portfolio is more towards $1m than few thousands. Few k is just a month's salary that they can "make" themselves. Offers that small would be almost off-putting.
That would be true if they gained the exclusive ability to reproduce works etc.

AI companies want a license to train not ownership of a portfolio.

Hey finally my reddit and hn habit can be lucrative!
There are definitely >1m artists worldwide, some popular some less so, and $1M * 1m =1T, not 0.1% * 200T =200B.

Hard cap of 200B divided by 1M equals 200k, and that would be sure more reasonable, but we aren't hearing artists responding favorably to hypotheticals in that range, so I'm skeptical that "ain't nobody gonna turn that down".

This isn’t ownership it’s a license.

I think the vast majority would agree to let AI companies train on their art for 10k let alone 200k. Don’t forget the average global salary is way below what you see in the US.

Put another way how many people would turn down 6+ months salary. Of course the vanishing tiny percentage people care about would want more, but that’s a separate question and not particularly valuable to AI companies.

> Put another way how many people would turn down 6+ months salary.

Didn't that exact social experiment took place in the US last year? I thought the result of that was disastrous if media reports are to be believed.

OTOH I remember creator of Wordle closed the "low few mil" deal instantly, so I do believe it unlikely that people turn down few _hundred_ months worth of salary. But those artists are not from regions with 50-100x less median income and/or wider income distribution relative to US - I think they're concentrated in relatively high-income-low-disparity regions - so I don't think there's backwater wherever that lifetime income there is equivalent to no more than 6 months worth in US that has abundant supply of artists.

And IMO those artists are basically engaged in a geo-scale dumping of media contents. It's the same phenomenon as how moving consumer electronics manufacturing to US instantly multiply costs by small integers instead of just incurring premiums in percentages. If that phenomenon were to be quenched and those effects were integrated into economy anyhow, that will change the global balances of power to some statistically significant degrees, like, we'd be seeing flying rocket amphibian McBoatfaces everywhere. That might be interesting, but I'm not sure if that's an interesting kind of an interesting thing to see.

Wordle involved actually selling the rights to something not just allowing AI to be trained on it while he kept the website.

That’s really not a reasonable comparison to what is being sought.

As to global artists, I was suggesting the majority of artists globally make ~20k USD or less per year as artists. To get to millions of artists you need to use a generous definition, so now Hollywood is full of actors how many of them made 20+k last year as an actor? If you disagree fine let’s double it and 6 months salary is still only 20k and would I suspect be a seriously tempting offer when you retain all rights to past and future works.

> The cumulative license fees required to properly compensate all artists is so absurd that it will probably genuinely burn down the entirety of global economy if paid.

That's kind of an interesting concept: "since the scale of my transgression was so big, I should get away with it scot-free."

That’s how eminent domain and regulatory takings work in most countries.
"If it took all the fossil fuel on Earth" What do you mean? To TRAIN an LLM model it takes roughly the same amount of energy as to raise a person, so it's not even really expensive in energy costs.