|
|
|
|
|
by uvnq
1124 days ago
|
|
Using this logic, every writer who has learned to read and write by reading books, or every artist who improved their craft by studying works, or every musician who learned the piano by practicing pieces, is also "stealing" in whatever they "originally" create due to learning via pattern recognition "tiny pieces of every work" in their data set. It's ridiculous to compare agents that generalize well to "stealing" pieces of the works they used to learn the generalizations. Obviously if an artist memorizes a painting in their data set and reproduces it, or an AI spits out the exact image instead of original works based on what it has learned, then that is theft. But generalization is not theft. At least in my view. To assume otherwise leads to some very dysfunctional logical conclusions |
|
Also, humans can be, and often are, found liable for copyright infringement or for piracy depending on how they conduct themselves. If a human was to reproduce a copyrighted book word for word, that would consist of copyright infringement regardless of whether it was done by rote memory, by copy and paste, or assisted by a black box LLM. Even if a human paraphrases another work they can still be found guilty of plagiarism if the paraphrase is still overly similar to the original source material. A human can also be guilty of copyright infringement if they use a copyright work as source material in certain ways. If I steal a stock image without paying for a license and add it in my Photoshop collage, I might be found to have pirated or infringed on the original image creator's property.
LLMs are trained on copyright data and can often reproduce that copyright data. It's an open question how we regulate this.
I personally think it would be fair for an artist or author to say their work was not licensed to be used in training a neural net or otherwise request to opt out.