Hacker News new | ask | show | jobs
by Workaccount2 667 days ago
It's only clear that training is a violation of copyright if you have a layman's understanding of how training works. There are no images stored in image models, just vectors that represent pixel relationships. You may call this fancy compression, but the ship runs aground if you try to "compress" a small set of images with a transformer - you just will get random noisy junk on the output.

Artists have a much firmer legal ground to stand on if they go after model output, but the goal is to kill image generators, not simply censor their output.

Think of it like this: If I splatter paint on a canvas, does jackson pollock have a copyright claim? Probably not, despite my creation being a product of training on his work. But it would be fair for my creation to be checked to see if it is too similar to one of his works.

1 comments

just vectors that represent pixel relationships

Ask DALL-E 2 for Mona Lisa and it will produce something clearly derived from the original work. The ability to recreate items from the training set depends on how these systems are trained, but they are clearly capable of retraining enough to be problematic.

The Harry Potter the movies aren’t the original books, derivative works don’t imply something is the same just that it’s directly derived from something else.

> If I splatter paint on a canvas, does jackson pollock have a copyright claim?

If you’re trying to copy him then actually yes he would. Being inspired by a technique is fine, but the difference is less subtle than you might think.

Copyright cares how something was created, if you end up with ‘random’ patterns that happen to look suspiciously similar to another work it’s extremely unlikely that you came to that point randomly. What’s the odds you would pick the same 12 colors as someone else and apply them in the same order? 12 factorial isn’t a small number and that’s before considering the color selection.

All what you said is why I believe artists have much firmer ground to stand on by going after output. We can have dumb AI that scans outputs for copyright violation the same way youtube scans for it.

Just because I can draw spider man from memory doesn't mean I owe Disney money or that I am 'problematic'. It means I just have to censor my outputs when doing drawings for people.

But again, artists don't want this outcome, so there is a purposeful muddying of the waters going on.

> I just have to censor my outputs when doing drawings for people.

If any output is infringing the model must itself be infringing by definition. The correct solution isn’t to censor the result the correct solution is to delete the model.