Hacker News new | ask | show | jobs
by Enhex 1820 days ago
it doesn't have to be exact to be copyright infringement, see non-literal copying. basic idea behind it is if you copy paste code and rename variables that doesn't mean its new code.
1 comments

Yeah, you'd have to assume they are parsing and normalizing this data in some way. There would still be some AST patterns or something similar you could look for in the same way, but it would be much trickier.

Plus considering this is a legal issue ... good luck with "there is a statistically significant similarity in AST outputs related to the most unique sections of this code base" type arguments in court. We're currently at the "what's an API" stage of legal tech understanding.

The real question is whether it constitutes derived work, though. And that is not a question of similarity so much so as provenance - if you start with a codebase that is GPL originally, and it gets gradually modified to the point where it doesn't really look anything like the original, it's still a derived work, and is still subject to the license.

Similarity can be used to prove derivation, but it's not the only way to do so. In this case, all the code that went into the model is (presumably) known, so you don't really need any sort of analysis to prove or disprove it. It is, rather, a legal question - whether the definition on the books applies here, or not.