|
|
|
|
|
by Scion9066
871 days ago
|
|
One thing that I think people forget about is that the prompt used when "reproduc[ing] those same copyrighted works" is also a part of why it spits out similar things. It's not just the model doing it. A traditional artist can be prompted to recreate a copyrighted work in much the same way with the right prompts. |
|
On the other hand, if you prompt a code generating model with some comment and a function declaration that it knows exists and it spits out 100+ lines of nearly verbatim code, that's a completely different story entirely. If I prompt a human with that sort of thing, they will almost certainly write different code even if they've seen the original source code in question. This is in part because the way humans write code is different from the way LLMs write code; humans tend to iterate somewhat non-linearly, and I think if you ask the same person to write the same thing on different days, they would probably come up with different results. It would be quite rare for a human to just see a familiar segment of code and then begin dumping near-verbatim copies of existing codebases.
AI models that readily and easily bias themselves toward outputting their inputs do exist. It is not clear how many models actually do this, but this is definitely a huge part of the concern when people talk about copyright and model weights.
It's a bit clouded by people who are just generally hoping that today's AI model weights are illegal for social reasons, but that's not the position I am trying to present. (I'm not really sure what we should do regarding societal impact.)