Hacker News new | ask | show | jobs
by polotics 674 days ago
No the <10GB size of the model does not imply any less copyright infrigement is occuring IMHO. The fact that there is a very efficient compression involved does not change the fact that a copy of the copyrighted material, that copy being not compressed in any way, was input into the process that generated the model, in breach of the copyrighted material's copyright.
1 comments

The training process doesn't involve any copies being made. At least anymore than viewing an image on the internet copies it into your RAM.

Transformers's analyze images, they don't copy them. You might call this semantics, but you probably also wouldn't call out an algorithm that counts black pixels on website images as "copyright violation".

There is a lot of nuance here and a lot to consider. Transformers are not archives of images, they are archives of relationships. This is key because you don't have to copy an image to measure the relationships between it's pixels.

Train a transformer on one image, and it will just output noisy garbage.

Is the concern that the output weights infringe on copyright, or that the the training material itself was obtained and used in a manner inconsistent with copyright law?
The concern is that AI will be better than artists for making art, and artists don't want their art to be part of the tool set for creating that AI.

Totally new situation for humanity that almost no one saw coming. So artists are forced to use the outdated and lone weapon they have; copyright claims.