Hacker News new | ask | show | jobs
by maoeurk 1344 days ago
> There are two issues -- (1) feeding copyrighted material in to an AI model, and (2) getting copyrighted material out.

> The latter is obviously a violation of copyright, full stop.

It's not obvious to me that (2) is a violation of copyright. Unlike patents, copyright violation is not as simple to prove. My understanding is that, at least in the US, independent creation is a valid defense against copyright infringement. For example if 2 people independently write the same story and can prove that they did, they can both hold copyright over that story.

The analogue to this does exist without AI, when creating something that looks like copyright infringement, clean room design (don't look at similar things) is often done to ensure that "independent creation" can be used as a valid defense in court. Given that, I think (1) is probably not safe to do at all if you can't prevent (2).

3 comments

Get Stable Diffusion to output Micky Mouse and see how far you can use that commercially without Disney stomping down on you hard.

Outputting copyrighted material is a violation of copyright, period. Whether that violation is enforceable depends on your means though.

And why is Micky Mouse not in the public domain as of 2022? There lies in the root of all these questions. The system is not designed to benefit people, but rent-seeking.
While I agree that copyright terms are unreasonably long, it's not relevant to this specific case.
> > There are two issues -- (1) feeding copyrighted material in to an AI model, and (2) getting copyrighted material out.

> > The latter is obviously a violation of copyright, full stop.

> It's not obvious to me that (2) is a violation of copyright. Unlike patents, copyright violation is not as simple to prove. My understanding is that, at least in the US, independent creation is a valid defense against copyright infringement. For example if 2 people independently write the same story and can prove that they did, they can both hold copyright over that story.

> The analogue to this does exist without AI, when creating something that looks like copyright infringement, clean room design (don't look at similar things) is often done to ensure that "independent creation" can be used as a valid defense in court. Given that, I think (1) is probably not safe to do at all if you can't prevent (2).

I don't think the analogue holds, the AI does have direct view of the actual code. In the most paranoid clean room design you have two teams, one analyses the behaviour of some software and writes a specification (without view of the source code), the other then uses that spec to write the reimplementation.

Copilot turns that on its head, you ask to do something it then looks up the source code how to do it and gives that to you.

> For example if 2 people independently write the same story and can prove that they did, they can both hold copyright over that story.

This is the theoretical case but I don't think I've ever seen that actually happen in practice.