This is a bad analogy. The factory producing crayons doesn’t need to ingest hundreds of millions of copyrighted works as a fundamental part of its process to make crayons.
I don’t think it is a bad analogy, it is just separating out the issues.
If the thing required breaking the law to make, it just shouldn’t have been made. But, in that case, Facebook should not accept liability for how their users use the thing. They should just not share it at all, and delete it.
Crayons aren’t made by mashing people’s artwork through a gpu.
Crayons don’t generate content either.
If I download something from megaupload (rip) megaupload is the one that gets in trouble. They are storing, compressing, and shipping that information to me.
The same thing happens with AI, the information is just encoded in the model weights instead of a video or text encoding or whatever. When you download a model, you’re downloading a lossy compressed version of all the data it was trained on.
This seems more like an argument that the model just shouldn’t have been created, or that it shouldn’t be used. If a model is just an lossy compressed version of a bunch of infringing content, why would Facebook (or OpenAI, or anybody else hosting a model and providing an API to it) be in the clear?
To be fair, maybe yes, these models shouldn’t have been created.
Well they have been created so now we need a new novel way to make sure they don’t damage other people’s work.
Something like this did not exist before, and therefore needs a new set of rules that the model creators, with all their might and power, are trying to strongly lobby against.
Tech likes to follow the “ask for forgiveness, not for permission “ motto.
If OpenAI, Facebook, or whoever asked for permission to gobble up all publicly visible data to train a program to output statistically similar data, I don’t believe they would’ve got the permission.
In that sense, I don’t think these models should’ve been made.
I dont think any of those companies would be in the clear. That’s my point.
AI is a copyright black hole, albeit a useful one.
Let's say a factory builds a mega puzzle from many images shredded to identically-shaped puzzle pieces so you can piece them together as you want or need. Some pieces from some images are omitted due to their closeness in image space or due to them being infrequent enough.
This 70B pieces puzzle is an LLM.
You can reproduces likings of any hero from Marvel universe close enough using this puzzle. Or you can create
If the thing required breaking the law to make, it just shouldn’t have been made. But, in that case, Facebook should not accept liability for how their users use the thing. They should just not share it at all, and delete it.