|
|
|
|
|
by Winsaucerer
1239 days ago
|
|
But it's not humans reading it, it's using it to train ML models. There are similarities between humans learning from books and ML models being trained on it, but there are also salient differences, and those differences lead to concerns. E.g., I am concerned about these large tech companies being the gatekeepers of AI models, and I would rather see the beneficiaries and owners of these models also be the many millions or billions of content creators who first made them possible. It's not obvious to me that the implicit permission we've been granting for humans to view our content for free also means that we've given permission for AI models to be trained on that data. You don't automatically have the right to take my content and do whatever you like with it. I have a small inconsequential blog. I intended to make that material available for people to read for free, but I did not have (but should have had!) the foresight to think that companies would take my content, store it somewhere else, and use it for training their models. At some point I'll be putting up an explicit message on my blog denying permission to use for ML training purposes, unless the model being trained is some appropriately open-sourced and available model that benefits everyone. |
|
actually you don't have the right to restrict the content, except as part of what's allowed in copyright law (those rights a spelt out - like distribution, broadcasting publicly, making derivative works).
specifically, you cannot have the right to restrict me from reading the works, and learning from it.
Imagine a hypothetical scenario - i bought your book, and counted the words and letters to compile some sort of index/table, and published that. Not a very interesting work, but it is transformative, and thus, you do not own copyright to my index/table. You cannot even prevent me from doing the counting and publishing.