Hacker News new | ask | show | jobs
by yokem55 884 days ago
That 'usage' is taking statistical notes about the work (creating factual statements about the work) and imputing those notes into a database, averaged with a few billion other notes about other works. That is a usage that copyright under current law simply doesn't cover or protect for. It doesn't even need the analysis if 'fair use' because, there's no copying or public performance happing in the creation/training of the model.

Where infringement arguably can happen is when that model is used in the generation of content - and if the user is prompting regenerate a protected work, then that is where the infringement happens. But not before. Maybe the various ai services can adequately guard against that illicit usage. Maybe not. And if not, its those live services that would need to be shut down.

But the creation and training of a model, and even distributing that model for people to use with their own computers in private does not engage in copyright infringement.

1 comments

> creating factual statements about the work

All digital content is statements about the work. Pictures are pixels with facts about colours. Books are characters describing the words.

AI models are compressed versions of those.

Downloading, ingesting into models, and redistributing them is copyright infringement.

Even if they weren’t compressed; downloading them without permission is not permitted. Every now and then you can read about ai criminals discussing techniques for illegally acquiring books, images and other protected content even here on HN. It wouldnt be an issue if it were licensed content, or a dudette training her local ai. These are companies that want to monetise people’s work. Somehow, just like thieves, they feel they are entitled to snatching someone else’s property.