|
|
|
|
|
by cycomanic
1346 days ago
|
|
> > Is it a valid defense against copyright infringement to say “we don’t know where we got it, maybe someone else copied it from you first?” > I mean, in humans it's just referred to as 'experience', 'training', or 'creativity'. Unless your experience is job-only, all the code you write is based on some source you can't attribute combined with your own mental routine of "i've been given this problem and need to emit code to solve it". In fact, you might regularly violate copyright every time you write the same 3 lines of code that solve some common language workaround or problem. Aren't you moving the goal posts? This is not 3 lines, but instead is 1 to 1 reproducing a complex function that definitely has enough invention height to be copyright able. |
|
It doesn't change licensing issue but it does mean people are already copying and using copyrighted code without respecting original license and no AI involved.
There should be a way to reverse engineer code LLMs to see which core bits of memorized code they build on. Another complex option is a combination of provenance tracking and semantic hashing on all functions in code used for training. Another option (non-technical) is a rethinking of IP.