Hacker News new | ask | show | jobs
by freshhawk 1821 days ago
Yeah, you'd have to assume they are parsing and normalizing this data in some way. There would still be some AST patterns or something similar you could look for in the same way, but it would be much trickier.

Plus considering this is a legal issue ... good luck with "there is a statistically significant similarity in AST outputs related to the most unique sections of this code base" type arguments in court. We're currently at the "what's an API" stage of legal tech understanding.

1 comments

The real question is whether it constitutes derived work, though. And that is not a question of similarity so much so as provenance - if you start with a codebase that is GPL originally, and it gets gradually modified to the point where it doesn't really look anything like the original, it's still a derived work, and is still subject to the license.

Similarity can be used to prove derivation, but it's not the only way to do so. In this case, all the code that went into the model is (presumably) known, so you don't really need any sort of analysis to prove or disprove it. It is, rather, a legal question - whether the definition on the books applies here, or not.