Hacker News new | ask | show | jobs
by Workaccount2 670 days ago
>and that someone feeding petabytes of copyrighted material into an AI model to fully automate generation of art is obviously copyright infringement.

It becomes a little less obvious when you learn that the models which had petabytes of images "go into it" are <10GB in size.

You have 5 million artists on one hand saying "My art is in there being used" and you have a 10GB file full of matrix vectors saying "There are no image files in here" on the other. Both are kind of right. ish. sort of.

2 comments

No the <10GB size of the model does not imply any less copyright infrigement is occuring IMHO. The fact that there is a very efficient compression involved does not change the fact that a copy of the copyrighted material, that copy being not compressed in any way, was input into the process that generated the model, in breach of the copyrighted material's copyright.
The training process doesn't involve any copies being made. At least anymore than viewing an image on the internet copies it into your RAM.

Transformers's analyze images, they don't copy them. You might call this semantics, but you probably also wouldn't call out an algorithm that counts black pixels on website images as "copyright violation".

There is a lot of nuance here and a lot to consider. Transformers are not archives of images, they are archives of relationships. This is key because you don't have to copy an image to measure the relationships between it's pixels.

Train a transformer on one image, and it will just output noisy garbage.

Is the concern that the output weights infringe on copyright, or that the the training material itself was obtained and used in a manner inconsistent with copyright law?
The concern is that AI will be better than artists for making art, and artists don't want their art to be part of the tool set for creating that AI.

Totally new situation for humanity that almost no one saw coming. So artists are forced to use the outdated and lone weapon they have; copyright claims.

is distributing a zip file of copyrighted material infringement? if it is I guess the argument is distributing this <10GB model that can _unzip_ into copyrighted material is infringement.

disclaimer: I'm just devil advocating. I don't believe this discussion is productive. the time for IP protection to be necessary for social good has gone and now it's just a time wasting idea