> We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set
That's kind of a useless stat when you consider that the code it generates makes use of your existing variable/class/function names when adapting the code it finds.
I'm not a lawyer, but I'm pretty sure I can't just bypass GPL by renaming some variables.
If we could answer those questions definitively, we could also put lawyers out of a job. There’s always going to be a legal gray area around situations like this.
You can probably tokenize the names so they become irrelevant. You can ignore non-functional whitespace, so that code C remains. Maybe one can hash all the training data D such that hash(C) is in hash(D). Some sort of Bloom filter...
> We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set
That's kind of a useless stat when you consider that the code it generates makes use of your existing variable/class/function names when adapting the code it finds.
I'm not a lawyer, but I'm pretty sure I can't just bypass GPL by renaming some variables.