| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gyomu 643 days ago
	This is a bad analogy. The factory producing crayons doesn’t need to ingest hundreds of millions of copyrighted works as a fundamental part of its process to make crayons.

2 comments

bee_rider 643 days ago

I don’t think it is a bad analogy, it is just separating out the issues.

If the thing required breaking the law to make, it just shouldn’t have been made. But, in that case, Facebook should not accept liability for how their users use the thing. They should just not share it at all, and delete it.

link

dartos 643 days ago

Crayons aren’t made by mashing people’s artwork through a gpu.

Crayons don’t generate content either.

If I download something from megaupload (rip) megaupload is the one that gets in trouble. They are storing, compressing, and shipping that information to me.

The same thing happens with AI, the information is just encoded in the model weights instead of a video or text encoding or whatever. When you download a model, you’re downloading a lossy compressed version of all the data it was trained on.

link

bee_rider 643 days ago

This seems more like an argument that the model just shouldn’t have been created, or that it shouldn’t be used. If a model is just an lossy compressed version of a bunch of infringing content, why would Facebook (or OpenAI, or anybody else hosting a model and providing an API to it) be in the clear?

link

camillomiller 643 days ago

To be fair, maybe yes, these models shouldn’t have been created. Well they have been created so now we need a new novel way to make sure they don’t damage other people’s work. Something like this did not exist before, and therefore needs a new set of rules that the model creators, with all their might and power, are trying to strongly lobby against.

link

dartos 643 days ago

Tech likes to follow the “ask for forgiveness, not for permission “ motto.

If OpenAI, Facebook, or whoever asked for permission to gobble up all publicly visible data to train a program to output statistically similar data, I don’t believe they would’ve got the permission.

In that sense, I don’t think these models should’ve been made.

I dont think any of those companies would be in the clear. That’s my point.

AI is a copyright black hole, albeit a useful one.

link

thesz 642 days ago

Let's say a factory builds a mega puzzle from many images shredded to identically-shaped puzzle pieces so you can piece them together as you want or need. Some pieces from some images are omitted due to their closeness in image space or due to them being infrequent enough.

This 70B pieces puzzle is an LLM.

You can reproduces likings of any hero from Marvel universe close enough using this puzzle. Or you can create

Who is to blame?

link