| HN Mirror

I think any answer to that question needs to be considered carefully, at least in a legal context, since it could end up having unintended consequences.

LLMs ingests works but does not regurgitate them, so the product can be considered transformative. From my understanding of these models, they do not retain the original works. (There are probably reasons for the companies to retain the original works, but that is an entirely different matter.) So equating a trained model to copyright violations is akin to suggesting the knowledge, rather than the content, is copyrightable. Do we really want to enter that territory?

The other route of attack is via how the materials were acquired. This can create problems from several perspectives. If companies had to purchase each work in order to train a model, the process would only be accessible to very well financed corporations. Libraries as well, since they are essentially in the business of purchasing works (albeit for an entirely different purpose). If you allowed borrowed works to be used while training models, the notion of lending would likely come under attack. I'm not sure we want to go there either. Then there is the question of online materials that are freely available. What would protect them?

I'm not a fan of AI and I am even less of a fan of Meta. I would love to see them have the book thrown at them. I'm just uncomfortable with the potential repercussions of throwing the book at them.