Hacker News new | ask | show | jobs
by schneidmaster 1809 days ago
Again, not a lawyer, just a guy who likes reading this stuff. The devil is usually in the details of copyright cases. The Turnitin case hinged substantially on whether Turnitin's use of copyrighted essays was "fair use". There are four factors[0] which determine fair use; the two more relevant factors here are "the purpose and character of your use" and "the effect of the use upon the potential market". The court found that Turnitin's use was highly "transformative" (meaning they didn't just e.g. republish essays; they transformed the copyrighted material into a black-box plagiarism detection service) and also found that Turnitin's use had minimal effect on the market (this is where "computers don't count" comes in -- computers reading copyrighted material don't affect the market much because a computer wasn't ever going to buy an essay).

I would be shocked if GitHub's lawyers didn't argue that using copyrighted material as training data for an AI model is highly transformative. There may be snippets available from the original but they are completely divorced from their original context and virtually unrecognizable unless they happen to be famous like the Quake inverse square root algorithm. And I think GitHub's lawyers would also argue that Copilot's use does not affect the _original_ market -- e.g. it does not hurt Quake's sales if their algorithm is anonymously used in a probably totally unrelated codebase.

Your counterexample would probably fail both tests -- it's not transformative use if your software hands out complete pieces of copyrighted software, and it would definitely affect the market if Copilot gave me the entire source code of Quake for my own game.

[0]: https://fairuse.stanford.edu/overview/fair-use/four-factors

1 comments

I thought I understood fair use but turns out I was wrong...

That being said, creating a transformative work from something else is considered fair use. So, for example, if I read a whole bunch of books and then, heavily influenced by them, create my own, similar book, that would be fair use I suppose... that makes sense.

But, where does the derivative works come in? Where do you draw the line?

If I am heavily influenced by billions of lines of other people's GPL code (ala Copilot!), then I create my own tool from it and keep my code hidden, does that not mean I am abusing the GPL license?

That's what I meant by the devil being in the details -- these gray area questions hinge on the specific facts. Lawyers on both sides will argue which factors apply based on past caselaw and available evidence, and the court renders a decision. For example, from the Stanford webpage I previously linked: "the creation of a Harry Potter encyclopedia was determined to be “slightly transformative” (because it made the Harry Potter terms and lexicons available in one volume), but this transformative quality was not enough to justify a fair use defense in light of the extensive verbatim use of text from the Harry Potter books". So you might be okay creating a Harry Potter encyclopedia in general, but not if your definitions are copy/pasted from the books, but you might still be okay quoting key lines from the books if the quotes are a small portion of your encyclopedia. The caselaw just doesn't lend itself to firm lines in the sand.
If you read a bunch of books and then create a similar book, that isn't transformative; transformative is like, you read a bunch of books and then create a machine translation service. The point of transformative is like "isn't going to conflict with the market or compete in any way with the original thing".