|
|
|
|
|
by 542458
1775 days ago
|
|
I think you’re missing that the law considers intent. If the devs of copilot were not trying to set up infringement, then their algorithm’s output is likely not considered infringement [1]. However, if you set out to “launder” copyrighted material then the law will take that into consideration and likely find that you violated copyright. This intent can be demonstrated in court either via your statements, or your actions (such as constructing a meaninglessly tiny training set). [1]: https://ilr.law.uiowa.edu/print/volume-101-issue-2/copyright... |
|
It seems to me that the licensing part is the part you can't throw into a big markov chain, legally. Even if they aimed only at open-source licensed material without exception, the point where they discard all the licenses and export a 'generic' slurry is the point where they infringe by definition. If they trained on more restrictive licenses that's just doubling down: what's needed is annotation and maintenance of what bits of code came from what licensing pool. You could well have a giant pool of GPL, a giant pool of MIT (which I would be in, all the more since I maintain a very automatable code style that's easy to import from). You could accumulate a list of sources for anything you did, at whatever level of granularity is desired.
The purpose of throwing away this attribution is intent to infringe. It's constructing a machine for the explicit purpose of grinding code into sludge of intentionally small enough pieces that, if you reconstruct copyrighted code in your markov-chainy way, you've got grounds for pretending you didn't build your machine to do exactly that.