|
> No, see Authors Guild v. Google. That case required that the output be transformative, in that "words in books are being used in a way they have not been used before". Copilot only fits the transformative aspect if it is not directly reciting code, that already exists in the form that it is redistributing. So long as it does so, it fails to meet the criteria. |
1. The act of training Copilot on public code
2. The resulting use of Copilot to generate presumably new code
#1 is arguably close to the Authors Guild v. Google case. You are literally transforming the input code into an entirely new thing: a series of statistical parameters determining what functioning code "looks like". You can use this information to generate a whole bunch of novel and useful code sequences, not just by feeding it parts of it's training data and acting shocked that it remembered what it saw. That smells like fair use to me.
#2 is where things get more dicey - just because it's legal to train an ML system on copyrighted data wouldn't mean that it's resulting output is non-infringing. The network itself is fair use, but the code it generates would be used in an ordinary commercial context, so you wouldn't be able to make a fair use argument here. This is the difference between scanning a bunch of books into a search engine, versus copying a paragraph out of the search engine and into your own work.
(More generally: Fair use is non-transitive. Each reuse triggers a new fair use analysis of every prior work in the chain, because each fair reuse creates a new copyright around what you added, but the original copyright also still remains.)