Hacker News new | ask | show | jobs
by vohk 925 days ago
While I somewhat agree with that take on copyright, I think you have to pick a lane to keep that position coherent:

Either you insist that copyright must be respected at every level, and the creators of material used for training deserve appropriate compensation, or

You throw out copyright completely in this context, but that means the resulting models cannot be treated as proprietary either unless they were produced using absolutely no unlicensed training data.

I think there is an argument for both. Want to create a proprietary model for commercial use? Pay up. Creating an open source, copyleft project exclusively for personal use and artistic expression? Exemption.

The current status quo is perfectly described by powerful corpos extracting rent. Billions for themselves and pennies for the average artist.

3 comments

> Either you insist that copyright must be respected at every level, and the creators of material used for training deserve appropriate compensation

I don't think that current copyright laws automatically entitles people to royalties from something like AI-generated imagery. The dichotomy you've presented here isn't pro-copyright vs anti-copyright, but "so pro-copyright that they argue for expanding the current laws" vs not.

> Want to create a proprietary model for commercial use? Pay up. Creating an open source, copyleft project exclusively for personal use and artistic expression? Exemption.

That definitely benefits all the "powerful corpos" you've mentioned here. Now, Disney, Adobe, Meta etc. can use a fraction of their money to get all the data they would ever need and be the sole profiteers, while all newcomers will face an impassable barrier to entry that prevents them from ever threatening the existing players.

Copyright already has a lot of limitations. It has never been, nor been intended to, be absolute, because the point is to promote the arts and a too strict grant of rights would stifle it instead - indeed most places it is accepted that copyright is a significant limitation on the liberty of society at large, justified (or not, depending on ones opinions) by encouraging more works, but accepting it as a restriction means there is some degree of acceptance that it should not be more expansive than it needs to be (and many will disagree about whether the current length of copyright is or is not more expansive than it needs to be)

The only limitation that needs to be there for training on copyrighted works not to be infringing is to accept that extracting information about the work is not infringing if copyrighted elements of the work itself is not significantly reproduced.

There is at least one middle ground area where you acknowledge that copyright and intellectual property restrictions should be removed, but that we should also recognize that all of the existing work was created by artists who expected they would have copyright protection. We should in my view not take from artists without their consent, and there is no implied consent when their works were posted at a time they believed they were protected by copyright.

This would mean we have to do a few difficult and worthwhile things: explicitly dismantle the copyright system, encourage artists to donate their existing works to the commons, and then only make datasets based on legally collected information. This would also have the side effect of encouraging the development of new training techniques and model designs which are more sample efficient.

I am afraid that what we will do instead is allow some erosion of copyright for small creators without dismantling the power large intellectual property holders have over the rest of us.

> We should in my view not take from artists without their consent

I think "take" is the wrong word here, nobody is republishing the copyrighted works, instead the model gets a gradient update. The update is shaped exactly like the model itself, and it gets stacked up with other updates from other examples. It doesn't look like the original work at all, the original work was a picture or book, the gradients look like a set of floating point tensors. AI models decompose inputs into basic concepts, they don't copy like bittorrent.

Why should an AI not be allowed to form a full world model that includes all published works? It's not like the authors can use copyright to stop anyone from seeing their works, they never had a right to stop others from seeing.

I am more arguing that if it’s considered taking, we should follow the path I recommend.

Whether or not it is taking is more nuanced, but I will say I’m not sympathetic to the idea that it’s broadly similar to a human looking at the work. It’s just very, very different. You can’t spin up a copy of a human on a cloud server and make them work 24/7.

I would expect that as laypeople we aren’t equipped to reason about this effectively. I suspect that decades or more of case law would be relevant to how this would be viewed, and I’m personally not equipped to argue it.

What I do know is that artists don’t feel good about it. They feel like they’re being taken from. And I’m not inclined to quickly dismiss their concerns. I think this needs careful, deliberate consideration. And if a system could be built that is consent based, I’d feel much better about it. A human child could be raised and mature without ever being exposed to copyrighted material beyond a handful of books (harder in the modern world but common 200 years ago). Maybe we just need to build better models. It certainly seems possible.