|
|
|
|
|
by whitfieldsdad
1022 days ago
|
|
I've received a lot of flak for this answer in other communities, but, if a statistical model is producing purely derivative works using a mathematical model that's basically a next best token predictor, is it really "stealing"? Is it "stealing" to have a working understanding of the next best token, or even simply the token that shows up the most often (e.g. on GitHub)? I'm sure that the argument could be made that all AI should be illegal as all ideas worth having have already been had, and all text worth writing has already been written, but, where would that leave us? (e.g. your function for converting a string from uppercase to lowercase will probably look like a function that someone else on Earth has written, and the same goes for your error handling code, your state of the art technique for centering a div, etc.) |
|
If I train a model that given the input "When Mr. Bilbo Baggins" produces the entirety of The Lord of the Rings trilogy and release it, I have probably infringed copyright.
If I train a model that produces some generic paragraphs about "mountains" and "dragons" but contains no meaningful direct quotes or phrases, then that probably isn't a violation on its own. Those words appear in Tolkien's works but are not themselves enough to copyright.
If to train that model it is demonstrated that I copied Tolkien's works in a way not allowed for by the copyright license, (ie buying the book once and copying their text thousands of times across servers to train an AI model) then perhaps I have violated copyright in the interim steps even if the output of my model is no longer consider a copy of the original works.
I don't think there are black and white answers here. At one point does a chopped up and statisticized copyrighted work become no longer a copyrighted work? Can you train a model on something without first copying that thing in a way that violates copyright law?
These are squishy human concepts that get decided by humans in courtrooms and legislative bodies. I don't think the details of the math involved are going to make a big difference in the eventual outcomes.