Hacker News new | ask | show | jobs
by karpierz 1248 days ago
You're making a distinction that isn't reflected in copyright law.

If I take a JPEG of Mickey Mouse and then turn it into a PNG, it's not a "copy", as the bits are different. But it still contains copyrighted material.

You can try to argue that the bits of PNG itself aren't an image of Mickey Mouse, but rather the algorithm that reads the PNG produces an image of Mickey Mouse. But that isn't really a relevant distinction in so far as copyright is concerned.

In addition, this statement is false:

> The important distinction is that the model does not contain a copy of a copyrighted material.

It has been shown repeatedly that the model produces copies of training data. The copies are of course not stored as JPEGs/PNGs in the model, but they are retrievable from the model, given the correct password (prompt).

1 comments

Could you provide evidence of your last statement? I haven’t seen these models produce actual copies of any art (can’t imagine that’s an option in general).

These models do not contain copies. One way to describe the data is they contain a statistical breakdown of the artwork, which is substantially different from a JPEG -> PNG conversion you mention.

> Could you provide evidence of your last statement? I haven’t seen these models produce actual copies of any art (can’t imagine that’s an option in general).

Here's a whole paper, complete with citations:

https://arxiv.org/pdf/2212.03860.pdf

> These models do not contain copies. One way to describe the data is they contain a statistical breakdown of the artwork, which is substantially different from a JPEG -> PNG conversion you mention.

I don't understand the distinction you're making. What legally separates a "statistical breakdown" representation from a zip file representation, JPEG representation, PNG representation?

As I anticipated, you are referencing research that does not show exact copies being generated by Stable Diffusion. Do "semantically equivalent" images infringe on copyright? I would argue that they do not. We will see how this plays out in court.

Food for thought: if I write instructions for generating an SVG of a black square, does my program contain copyrighted material (Malevich's Black Square)? You and I could argue about that, but you will probably quote more research that disproves your own point. So let's skip that.

> As I anticipated, you are referencing research that does not show exact copies being generated by Stable Diffusion. Do "semantically equivalent" images infringe on copyright? I would argue that they do not. We will see how this plays out in court.

If you're convinced that a photo of Mickey Mouse with slightly larger ears, or slightly reddish pants isn't copyright infringement, then sure, neither is any of this stuff. It would also mean that republishing copyrighted images with lossy compression algorithm (IE, JPEGs) would also not violate copyright.

I would suggest looking at the actual laws around copyright instead of relying on what you feel copyright should be.

> If you're convinced that a photo of Mickey Mouse with slightly larger ears, or slightly reddish pants isn't copyright infringement, then sure, neither is any of this stuff.

Isn't this already well-established? For example, this image, used in The Simpsons:

https://static.simpsonswiki.com/images/d/d4/Mickey_Mouse.png

is clearly Mickey Mouse in intent, but not in a copyright infringing way.

To be clear, I'm not saying that you can't create Mickey Mouse images that are transformative (or that Disney might not bother suing over; I think there'd be a lawsuit if the Simpsons tried making a commercial film following the adventures of their rendition of Mickey Mouse).

Also, that usage of Mickey Mouse might be copyright infringing, but fall under fair use (probably parody), which is a specific defense of copyright infringement (similar to "self defense").

What I am saying is that:

1) If your model returns images which look near-identical to your training data, then any copyright infringement that applies to the training image will also apply to your image.

2) If your model can consistently return copyrighted imagery, there's little difference between explicitly sharing those images (with a password) and implicitly sharing them (via a model + prompt).

Amazing. It’s so obvious I wonder why billion dollar corporations didn’t figure out the legal implications of these models yet. Do you have an email address I could pass on to OpenAI?