Hacker News new | ask | show | jobs
by quitit 1123 days ago
Setting up the precedent that training from materials = theft seems pretty scary to me. First because it redefines learning as stealing, and secondly because it is without proving that the source material authors are in someway deprived of something - and in a way that is no different than if a human learnt from their materials and produced content with that knowledge.

Let's say the AI was used to generate illegal content, if these words/images are truly non-transformative and still the property of those from which the model was trained this would be a pretty grim scenario. It seems much more reasonable that the person who prompts the system to build such content would be responsible, and thus the true owner of the output.

For this discussion it's useful to keep in mind that ChatGPT and other AI tools don't spontaneously create content, they create it in response to a human "query". It's also the human who decides whether or not the material is useful and suitable (as it often is not accurate, truthful or useful.)

From here it seems more like a discussion about plagiarism and copyright, but both of these occur beyond the scope of the article. I feel authors haven't taken to this angle because the end materials are reasonably different from the sources (notwithstanding memorisation effects.)

I do agree with the sentiment that ChatGPT isn't intelligent (but AI has never claimed to reproduce true intelligence). I prefer the tongue in cheek description of "spicy autocorrect" as a fairer representation of its capability.

1 comments

I was thinking about this further as there's a lot of grey area to the idea of who owns the words, after all everyone is using the same words just in different combinations. What about words which aren't in the lexicon, specifically unique trademarks: these are words that are entirely unique e.g. "kleenex" and so on. These words are traceable to the trademark owner.

By the standard that training from materials = theft: Any reference to one of these unique trademarks would be interesting and highly problematic. AI wouldn't be allowed to write any kind of non-editorial text that uses unique trademarked product names without it being criminal.