|
|
|
|
|
by clintfred
1087 days ago
|
|
Genuine question here; not trying to be snarky. How is AI "reading" code different from me reading code? Is the difference the AI's ability for perfect memory? I can read open source code, (even GPL) and not have all future code I independently write be subject to that license. I don't think anyone would argue that I immediately "forget" any OSS code that I read, so it's becoming part of the structure of my brain (and potentially influencing future code I wrote), but unless I'm linking to the code or copying pieces out, verbatim, I'm generally in the clear. Of course there are some sticky situations clean-room, reverse engineering, but those seem like pretty narrow examples. |
|
(I’m not really interested in arguing whether that’s all they do, or whether it’s the purpose of LLMs—those details are just a distraction from the original question: what makes LLM training different than a human reading code.)
If the model has memorized the training set and can reproduce it verbatim when prompted, then it should be incumbent on the AI owner to prove that it does not reproduce copyrighted code when it is not explicitly prompted.