|
|
|
|
|
by rndmwlk
909 days ago
|
|
It’s disingenuous to frame using data to train a model as a “view,” of that data. The simple cases are the easy ones, if ChatGPT completely rips a NYT article then that’s obviously infringement; however, there’s an argument to be made that every part of the LLM training dataset is, in part, used in every output of that LLM. I don’t know the solution, but I don’t like the idea that anything I post online that is openly viewable is automatically opted into being part of ML/AI training data, and I imagine that opinion would be amplified if my writing was a product which was being directly threatened by the very same models. |
|
You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.
Saw this a lot with some earlier image models where you could type in an artists name and get their work back.
The fact that AI models are having to put up guardrails to prevent that sort of use is a good sign that they weren't trained ethically and they should be paying a ton of licensing fees to the people whose content they used without permission.