Hacker News new | ask | show | jobs
by EMIRELADERO 415 days ago
> Part of my point is that you don't need to produce literally equivalent output. Again, if I record and compress "Revenge of the Sith", there's literally zero pixels shared between my recording and the actual movie. Cool, so I can go upload it for free then right? No, I can't.

That's because you would be redistributing the actual material, just in a really roundabout way. GenAI models are not that, they're not a database and don't work like one.

> Can GenAI produce indistinguishable images to what's on Getty Images?

That doesn't matter because you can't copyright a style. From the point of view of copyright law, it would look like you were copying nothing proprietary/owned at all.

1 comments

> That's because you would be redistributing the actual material, just in a really roundabout way.

Right, which I’m arguing is what LLMs do just in an even more roundabout way.

The technical details of LLMs don’t actually matter. We don’t really care if they’re a database or not. The question is do they reproduce the source material? And yeah, pretty much they do, in a lot of instances. Not all, but a lot.

To produce yet another analogy, imagine I have a service X. You can pay and I will give you any movie you want. You don’t know how I do it. Is this copyright infringement or not? I would say yes. Now let’s say I reveal the secret - I open up photoshop and painstakingly recreate the movie frame by frame. I might make a mistake here or there. Is this still copyright infringement? I think it is.

> To produce yet another analogy, imagine I have a service X. You can pay and I will give you any movie you want. You don’t know how I do it. Is this copyright infringement or not? I would say yes. Now let’s say I reveal the secret - I open up photoshop and painstakingly recreate the movie frame by frame. I might make a mistake here or there. Is this still copyright infringement? I think it is.

Okay, but that is not what's happening here. Demonstratably so. The fact that a model is technically capable of overfitting to certain very repeated points in the training data doesn't mean the entire thing has to be shot down. The non-infringing uses far outweigh the offending ones, by a lot.

If what you say is true, and they do outright copy a lot, then it should be pretty easy for any IP holder to sue anyone who misuses the model that way for copyright infringement on those specific outputs.